Profile cover photo
Profile photo
Mushtaq Bhat
32 followers -
artist
artist

32 followers
About
Posts

Post has shared content
Website outages and blackouts the right way

tl;dr: Use a 503 HTTP status code but read on for important details.

Sometimes webmasters want to take their site offline for a day or so, perhaps for server maintenance or as political protest. We’re currently seeing some recommendations being made about how to do this that have a high chance of hurting how Google sees these websites and so we wanted to give you a quick how-to guide based on our current recommendations.

The most common scenario we’re seeing webmasters talk about implementing is to replace the contents on all or some of their pages with an error message (“site offline”) or a protest message. The following applies to this scenario (replacing the contents of your pages) and so please ask (details below) if you’re thinking of doing something else.

1. The most important point: Webmasters should return a 503 HTTP header for all the URLs participating in the blackout (parts of a site or the whole site). This helps in two ways:

a. It tells us it's not the "real" content on the site and won't be indexed.

b. Because of (a), even if we see the same content (e.g. the “site offline” message) on all the URLs, it won't cause duplicate content issues.

2. Googlebot's crawling rate will drop when it sees a spike in 503 headers. This is unavoidable but as long as the blackout is only a transient event, it shouldn't cause any long-term problems and the crawl rate will recover fairly quickly to the pre-blackout rate. How fast depends on the site and it should be on the order of a few days.

3. Two important notes about robots.txt:

a. As Googlebot is currently configured, it will halt all crawling of the site if the site’s robots.txt file returns a 503 status code for robots.txt. This crawling block will continue until Googlebot sees an acceptable status code for robots.txt fetches (currently 200 or 404). This is a built-in safety mechanism so that Googlebot doesn't end up crawling content it's usually blocked from reaching. So if you're blacking out only a portion of the site, be sure the robots.txt file's status code is not changed to a 503.

b. Some webmasters may be tempted to change the robots.txt file to have a “Disallow: /” in an attempt to block crawling during the blackout. Don’t block Googlebot’s crawling like this as this has a high chance of causing crawling issues for much longer than the few days expected for the crawl rate recovery.

4. Webmasters will see these errors in Webmaster Tools: it will report that we saw the blackout. Be sure to monitor the Crawl Errors section particularly closely for a couple of weeks after the blackout to ensure there aren't any unexpected lingering issues.

5. General advice: Keep it simple and don't change too many things, especially changes that take different times to take effect. Don't change the DNS settings. As mentioned above, don't change the robots.txt file contents. Also, don't alter the crawl rate setting in WMT. Keeping as many settings constant as possible before, during, and after the blackout will minimize the chances of something odd happening.

Questions? Comment below or ask in our forums: http://www.google.com/support/forum/p/Webmasters?hl=en

Post has shared content
An often pirated content producer/owner & publishers point of view...
Before Solving a Problem, Make Sure You've Got the Right Problem

I was pleased to see the measured tone of the White House response to the citizen petition about #SOPA and #PIPA

https://wwws.whitehouse.gov/petitions#/!/response/combating-online-piracy-while-protecting-open-and-innovative-internet

and yet I found myself profoundly disturbed by something that seems to me to go to the root of the problem in Washington: the failure to correctly diagnose the problem we are trying to solve, but instead to accept, seemingly uncritically, the claims of various interest groups. The offending paragraph is as follows:

"Let us be clear—online piracy is a real problem that harms the American economy, and threatens jobs for significant numbers of middle class workers and hurts some of our nation's most creative and innovative companies and entrepreneurs. It harms everyone from struggling artists to production crews, and from startup social media companies to large movie studios. While we are strongly committed to the vigorous enforcement of intellectual property rights, existing tools are not strong enough to root out the worst online pirates beyond our borders."

In the entire discussion, I've seen no discussion of credible evidence of this economic harm. There's no question in my mind that piracy exists, that people around the world are enjoying creative content without paying for it, and even that some criminals are profiting by redistributing it. But is there actual economic harm?

In my experience at O'Reilly, the losses due to piracy are far outweighed by the benefits of the free flow of information, which makes the world richer, and develops new markets for legitimate content. Most of the people who are downloading unauthorized copies of O'Reilly books would never have paid us for them anyway; meanwhile, hundreds of thousands of others are buying content from us, many of them in countries that we were never able to do business with when our products were not available in digital form.

History shows us, again and again, that frontiers are lawless places, but that as they get richer and more settled, they join in the rule of law. American publishing, now the largest publishing industry in the world, began with piracy. (I have a post coming on that subject on Monday.)

Congress (and the White House) need to spend time thinking hard about how best to grow our economy - and that means being careful not to close off the frontier, or to harm those trying to settle it, in order to protect those who want to remain safe at home. British publishers could have come to America in the 19th century; they chose not to, and as a result, we grew our own indigenous publishing industry, which relied at first, in no small part, on pirating British and European works.

If the goal is really to support jobs and the American economy, internet "protectionism" is not the way to do it.

It is said (though I've not found the source) that Einstein once remarked that if given 60 minutes to save the world, he would spend 55 of them defining the problem. And defining the problem means collecting and studying real evidence, not the overblown claims of an industry that has fought the introduction of every new technology that has turned out, in the end, to grow their business rather than threaten it.

P.S. If Congress and the White House really want to fight pirates who are hurting the economy, they should be working to rein in patent trolls. There, the evidence of economic harm is clear, in multi-billion dollar transfers of wealth from companies building real products to those who have learned how to work the patent system while producing no value for consumers.

P. P.S. See also my previous piece on the subject of doing an independent investigation of the facts rather than just listening to the appeals of lobbyists, https://plus.google.com/107033731246200681024/posts/5Xd3VjFR8gx

Post has attachment

Post has shared content
Energy like Air, an abundantly existent gift of mother nature, monopolized by none, & denied to none ... a fairy tale dream?
Wait while more posts are being loaded