Google defines scraping as follows:
My question, as set out in the title to this post, is whether or not scraping is a violation of copyright. It turns out that the answer is likely very complicated. You have to look at the definition of a scraping site very carefully. Let me give you some hypotheticals to show what I mean.
- Sites that copy and republish content from other sites without adding any original content or value
- Sites that copy content from other sites, modify it slightly (for example, by substituting synonyms or using automated techniques), and republish it
- Sites that reproduce content feeds from other sites without providing some type of unique organization or benefit to the user
Let's suppose that I write a blog and put a link in my blog post to your blog. Does that link violate your copyright? I can't imagine that anyone would think that there was problem with linking to another website on the Web. In this case, there is no content from the originating site, just a link.
But let's carry the hypothetical a little further. What if I put a link to your site and quote some of your content? Does this violate copyright law? If you are acquainted with any of the terminology of copyright law; think fair use. The issue here is whether or not the "quoted" material is a substantial reproduction of the entire original content? I would have the opinion that duplicating an entire blog post either with or without attribution would be a violation of the originator's copyright.
So is the scraping website protected by the "fair use" doctrine? Does the fact that the motivation for listing the original websites is to make money have anything to do with how you would decide if there was or was not a violation of the originator's copyright? By the way, the copyright does not make a distinction between a commercial and non-commercial use of the original constituting or not constituting a violation of copyright. The fact that the reproducing (scraping) party does not make money from the reproduction is not a factor in the issue of violation, although it may ultimately be an issue as to the amount of damages assessed.
Does the fact that the actions of the scraper annoy me, make any difference? I would answer, not in the least. Whether or not you are annoyed by the violation of the copyright makes no difference as to whether or not there is a violation. Likewise, you have no independent claims for your wounded feelings because of the copied content. Copyright is a statutory action (i.e. based on statutory law) and unless the cause of action is recognized by the law, there is no cause of action. Now, in an outrageous case, you may have some kind of tort (personal injury) claim, but that is way outside of my hypothetical situation.
So what is the answer? Does scraping violate the originator's copyright? If only a small portion of the blog is copied (scraped) then I would have to have the opinion that it is not. Essentially, no matter what the motivation of the scrapper, there is not enough content copied to violate the fair use doctrine. Now, that is my opinion. Your's might differ. That is what makes lawsuits.
Do I think there are other reasons why scraping websites are objectionable? Certainly, but those reasons have nothing to do with copyright and they are probably the subject of another different blog post. So, if you are reading this from scraping website, bear in mind that there may be a serious problem with that type of website.