Shared publicly  - 
 
An interesting look at content scraping from +AJ Kohn
3
Jim Connolly's profile photoWalker Lot's profile photoSteve Masters's profile photoNeville Hobson's profile photo
31 comments
 
Content scraping is slowly dying, and so are autoblogs, with the new Google algorithm Penguin. More and more autoblogs with scraped content are getting de-indexed or penalized, unless, of course, you are using a hell of a good spinner.
 
Now there's a quotable phrase: "Scraped content is the arterial plaque of the Internet."
 
some call it sharing of information. without scraping many sites would have few if any readers.
 
+Joel Hughes 'scraped' typically does mean 'theft', ie, someone has taken your content - a larger proportion than anyone reasonably would say is a fair use - without your permission and reposted it (sometimes, repurposed) often surrounded with ads.

Clearly stating on your site what you allow others to do with your content (a Creative Commons license is the easiest way for this) is great for honest folk. Scrapers tend not to be in this group, so t&c or whatever would have little impact.

All part of the online landscape.
 
i view my site as a news aggregation and information sharing site. Others may view it as a scraping site. I however think the site serves a useful purpose as well as send readers to original content sites.
 
+Joel Hughes providing a link back to the original post doesn't mean that the act of the scraper in using your content without your permission is therefore legitimate. The scraper needs your permission, or to abide by any terms of use you make available on your site, in order to not be a thief.

It's a murky area from a copyright perspective. Still, reasonable people know very well what's right and what's not.
 
Scraped content is theft - against copyright law. Without prior permission you cannot just republish someone's article, even if you attribute it to them. If I see a pingback from a site that has scraped something of mine it's usually only because I included a link to my own site in my article, and they pick up the link. They have no intention of crediting me or asking permission, or even of pinging back, they are just taking content to fill up their own spam blogs.

I don't fret about it because I have links in the content and, at the end of the day, Google will presumably show my version of my article above theirs in results, so as long as I don't promote them, I don't worry too much about it. That's not to say I agree with what they are doing but I'm not going to beat myself up trying to take action over it unless it gets out of hand.
 
+Neville Hobson & +Joel Hughes I get each post from my blog scraped, by around 2 dozen sites. A few scrape everything including the links, most however do not. Very, very few ever include the Creative Commons requirement for attribution.

I'm less concerned with the semantics +Joel Hughes. Call it what you feel comfortable with. We all view it differently. I started regarding it as theft, when I started seeing my work taken, word for word, then attributed to someone else (by name) with their photo on their blog, as the author.
 
+Andres Leal I'm also seeing a drop in scraping from my main site. I get around 20/30 scrapes a day now. previously, I was seeing double that.

I hope that the recent Google changes have made it less rewarding for scrapers.
 
BTW: +Joel Hughes - I often disagree with you, but love that you put your thoughts out there and widen the debate. You're my 2nd favorite Welshman.
 
+Joel Hughes the question of which content will rank higher depends on a number of factors - the authority of the site hosting the content, the location of that website in relation to the person doing a search and the relevance of the website to the person doing a search... I write for one site that shows up second to another site that syndicates the content from the first site, but in that scenario, the site with the higher reputation is driving traffic and rank back to the first, so it's a legitimate price to pay to build reputation.

I think where the illegal scrapers are concerned, you are right - Google will sort it out and, as +Jim Connolly says, it will become less fruitful for them to do. The whole practice of filling a site with scrapped content just to gain affiliate revenue from Adsense ads is a key practice being targeted by Google. That's why so many blog networks and scraper sites have been de-ranked lately.
 
+Walker Lot If you are talking about your Occasions To Be website, yes you are breaching copyright by republishing the bulk of articles from other sources without permission.
 
+Joel Hughes - Although you're better looking, I have to say my brother (just in case he reads my stuff.)
 
+Steve Masters thanks, it's something that's been in my mind for a while, Jim's G+ post was this morning's trigger :)
 
+Steve Masters So, Google is a scraper? I believe , permission to link to articles is implied as the internet would cease to exist without it. Information is Freedom
 
+Walker Lot Google is scraping snippets so it can point people to articles. You are copying article content and republishing it. Google is within the law, you are not. You would be better to take just a snippet of an article and then point to the whole article, or better still, write your own version.

Information is freedom, true, but copyright is protected.
 
+Steve Masters Snippits? http://webcache.googleusercontent.com/search?q=cache:7pIJE763xXMJ:www.vertical-leap.co.uk/blog/monitoring-internal-site-search/+&cd=2&hl=en&ct=clnk&gl=us

I do Not believe my site is in violation of international copyright or guilty of plagiarism. we're just little guys sharing information that is for the most part overlooked by the mainstream. Nothing more then a record swap. I think it's wrong to allow information to be controlled by a minority , and , dangerous. Look what's happening to this world right infront of our eyes.
 
Walker, the google cache is not published content. It is a cache of a web page to help google display links to that page in search results. It carries no commercial value, unlike your page where you are publishing ads next to articles you do not own. It's no good comparing what you do with what Google does. They are not the same.
 
You are naive of you think you have a moral right on the grounds that information should be free. Perhaps food should be free too because we also all need that.
 
+Steve Masters i'll concede that you win this argument but i hope that we don't lose the fight. because i honestly believe that humanity is at stake when it comes to maintaining the freedom of sharing information. It should never be allowed to be be controlled by just a few. Just as any time traveler from 1984.
 
I absolutely agree with you on that Walker. Information should be as free as possible at least so it is harder for the few to oppress the many.

There is a difference between information and creative works though, and it's important to draw a distinction between commercialisation of someone else's written work and the benevolent sharing of facts.
 
Well, I guess that leaves my site off the hook. Because we sure aren't making any money at it!
 
:-) Joking aside though +Walker Lot you should take care. Whether you make money or not, when you publish a website you are a publisher bound by the same copyright rules as everyone else, so you should make sure you understand the law before being flippant about what you believe your rights ought to be. You should be wary, too, of copyright trolls, who may sue you without warning if you are copying content they own.

Here are some interesting links for you.
http://www.webmasterworld.com/forum44/841.htm
http://groundwire.org/resources/articles/fair-use
http://www.nytimes.com/2009/03/02/business/media/02scrape.html?pagewanted=all
 
laws and society don't change unless they are challenged to do so.
 
Challenging a law that everyone is happy with just to suit your own agenda (ie, not bothering to write your own content) is a route to failure. Just publish good content of your own and stop trying to justify theft.
 
I haven't seen the EOS symbol ~30~ used in a while.
Add a comment...