There is a fun "SEO geeks" discussion on Twitter between +Barry Schwartz
, +Ann Smarty
, +Andy Beard
, and +Rishi Lakhani
about what it means for a URL to be indexed. For background read Barry's blog post on Search Engine Roundtable ( http://www.seroundtable.com/seo-arguments-14717.html
Here is the answer:
1. A page that is not blocked from crawling and has a meta noindex tag will not be shown in the search results. Of course it's not instantaneous as Googlebot has to crawl the page first and we need to process the contents to discover the noindex tag.
2. If you block the crawling of a page using robots.txt, Googlebot will not be able to crawl the page. If that page has a noindex meta tag, we won't know about it because Googlebot is blocked from crawling it (that's an important point!).
3. Pages blocked from crawling may still be shown in the search results because we may discover enough info about them from the Open Directory Project or from other places around the web.
I think that covers all the points under discussion. Any questions?
This is a good reference with more details: http://code.google.com/web/controlcrawlindex/docs/robots_meta_tag.html