tl;dr: Use a 503 HTTP status code but read on for important details.
Sometimes webmasters want to take their site offline for a day or so, perhaps for server maintenance or as political protest. We’re currently seeing some recommendations being made about how to do this that have a high chance of hurting how Google sees these websites and so we wanted to give you a quick how-to guide based on our current recommendations.
The most common scenario we’re seeing webmasters talk about implementing is to replace the contents on all or some of their pages with an error message (“site offline”) or a protest message. The following applies to this scenario (replacing the contents of your pages) and so please ask (details below) if you’re thinking of doing something else.
1. The most important point: Webmasters should return a 503 HTTP header for all the URLs participating in the blackout (parts of a site or the whole site). This helps in two ways:
a. It tells us it's not the "real" content on the site and won't be indexed.
b. Because of (a), even if we see the same content (e.g. the “site offline” message) on all the URLs, it won't cause duplicate content issues.
2. Googlebot's crawling rate will drop when it sees a spike in 503 headers. This is unavoidable but as long as the blackout is only a transient event, it shouldn't cause any long-term problems and the crawl rate will recover fairly quickly to the pre-blackout rate. How fast depends on the site and it should be on the order of a few days.
3. Two important notes about robots.txt:
a. As Googlebot is currently configured, it will halt all crawling of the site if the site’s robots.txt file returns a 503 status code for robots.txt. This crawling block will continue until Googlebot sees an acceptable status code for robots.txt fetches (currently 200 or 404). This is a built-in safety mechanism so that Googlebot doesn't end up crawling content it's usually blocked from reaching. So if you're blacking out only a portion of the site, be sure the robots.txt file's status code is not changed to a 503.
b. Some webmasters may be tempted to change the robots.txt file to have a “Disallow: /” in an attempt to block crawling during the blackout. Don’t block Googlebot’s crawling like this as this has a high chance of causing crawling issues for much longer than the few days expected for the crawl rate recovery.
4. Webmasters will see these errors in Webmaster Tools: it will report that we saw the blackout. Be sure to monitor the Crawl Errors section particularly closely for a couple of weeks after the blackout to ensure there aren't any unexpected lingering issues.
5. General advice: Keep it simple and don't change too many things, especially changes that take different times to take effect. Don't change the DNS settings. As mentioned above, don't change the robots.txt file contents. Also, don't alter the crawl rate setting in WMT. Keeping as many settings constant as possible before, during, and after the blackout will minimize the chances of something odd happening.
Questions? Comment below or ask in our forums: http://www.google.com/support/forum/p/Webmasters?hl=en
There are many arguments against SOPA and PIPA that are based on the
potential harm they will do to the Internet. (There's a comprehensive
outline of those arguments here https://www.eff.org/deeplinks/2012/01/how-pipa-and-sopa-violate-white-house-principles-supporting-free-speech) At O'Reilly, we
argue that they are also bad for the content industries that have
proposed them, and bad industrial policy as a whole.
The term "piracy" implies that the wide availability of unauthorized
copies of copyrighted content is the result of bad actors preying on
the legitimate market. But history teaches us that it is primarily a
result of market failure, the unwillingness or inability of existing
companies to provide their product at a price or in a manner that
potential customers want. In the 19th century, British authors like
Charles Dickens and Anthony Trollope railed against piracy by American
publishers, who republished their works by re-typesetting "early
sheets" obtained by whatever method possible. Sometimes these works
were authorized, sometimes not. In an 1862 letter to the Athenaeum,
Fletcher Harper, co-founder of American publisher Harper Brothers,
writing in reply to Anthony Trollope's complaint that his company had
published an unauthorized edition of Trollope's novel Orley Farm,
noted: "In the absence of an international copyright, a system has
grown up in this country which though it may not be perfect still
secures to authors more money than any other system that can be
devised in the present state of the law.... We cannot consent to its
overthrow till some better plan shall have been devised."
America went on to become the largest market in the world for
That is exactly the situation today. At O'Reilly, we have published
ebooks DRM-free for the better part of two decades. We've watched the
growth of this market from its halting early stages to its robust
growth today. More than half of our ebook sales now come from
overseas, in markets we were completely unable to serve in print.
While our books appear widely on unauthorized download sites, our
legitimate sales are exploding. The greatest force in reporting
unauthorized copies to us is our customers, who value what we do and
want us to succeed. Yes, there is piracy, but our embrace of the
internet's unparalleled ability to reach new customers "though it may
not be perfect still secures to authors more money than any other
system that can be devised."
The solution to piracy must be a market solution, not a government
intervention, especially not one as ill-targeted as SOPA and PIPA. We
already have laws that prohibit unauthorized resale of copyrighted
material, and forward-looking content providers are developing
products, business models, pricing, and channels that can and will
eventually drive pirates out of business by making content readily
available at a price consumers want to pay, and that ends up growing
Policies designed to protect industry players who are unwilling or
unable to address unmet market needs are always bad policies. They
retard the growth of new business models, and prop up inefficient
companies. But in the end, they don't even help the companies they try to protect. Because those companies are trying to preserve old business models and pricing power rather than trying to reach new customers, they ultimately cede the market not to pirates but to
legitimate players who have more fully embraced the new
opportunity. We've already seen this story play out in the success of
Apple and Amazon. While the existing music companies were focused on
fighting file sharing, Apple went on to provide a compelling new way
to buy and enjoy music, and became the largest music retailer in the
world. While book publishers have been fighting the imagined threat
of piracy, Amazon, not pirates, has become the biggest threat to their
business by offering authors an alternative way to reach the market
without recourse to their former gatekeepers.
Hollywood too, has a history of fighting technologies, such as the
VCR, which developed into a larger market than the one the industry
was originally trying to protect.
In short, SOPA and PIPA not only harm the internet, they support
existing content companies in their attempt to hold back innovative
business models that will actually grow the market and deliver new
value to consumers.
- Web Developer, present
A List Apart: Articles: Dark Patterns: Deception vs. Honesty in UI Design
Dark Patterns: Deception vs. Honesty in UI Design. by Harry Brignull. Published in: Graphic Design. Discuss this article ». | Share this art
Ruby, Ruby on Rails, and _why: The disappearance of one of the world’s m...
In March 2009, Golan Levin, the director of Carnegie Mellon University’s interdisciplinary STUDIO for Creative Inquiry, invited an enigmatic
In Which I Fix My Girlfriend’s Grandparents’ WiFi and Am Hailed as a ...
Lo, in the twilight days of the second year of the second decade of the third millennium did a great darkness descend over the wireless int