Why you should never use the number of search results reported by Google for serious activities
The chart that I have attached to this post shows the current results of an on-going experiment about de-indexing resources
. I had about 30,000 unimportant resources in a web site and I 410ed them, to see how Google would react to the big change.
The expected results were that the quantity of those resources would decrease over time and in order to monitor this quantity I have used two of the most less reliable tools that you can think of
: the "site:" search operator and the number of results reported by Google.
Now, there is an extremely good reason
why the number of results reported by Google, either for a "site:" query or for any other kind of queries, should never be used for SEO reasons in a serious way. For example you don't want to rely on these numbers for calculating "query competition" indexes. The reason is that that feature is not actually a counter
. It simply doesn't count the resources that are listed in the SERP. It's a completely different beast.
The number on resources reported by Google is usually an estimation so crazily raw
that anyone would call the restaurant maître to ask for this steak to be cooked more. This estimation is so bad that it will mislead you and, more importantly, it would negatively affect the behavior of any formula that would rely on it.
There are several well-known examples and proofs about how unreliable that number is. For example, many SEOs have observer that the number of results reported in the first SERP page changes
if you go to the second, third or last page of results. Even Matt Cutts has addressed this issue in the following video:Why might the estimated number of results change when going from page 1 to page 2?
In the chart that I'm sharing with you, the reported number of "Results C" is about 4,000
, at the moment, but if you move towards the last pages of results, you get a count of just 90
results. So, we're talking about a difference of almost two orders of magnitude
between what's reported in the first page and what's reported in the last one!
The chart also shows a common issue of any
Google query: results change also according to the datacenter that replies to the user request. That's why you see deep trough in all those lines: datacenters are never perfectly synchronized and you can get very different numbers even for queries submitted within the same day.
Another usual phenomenon is that in some cases, the number of results reported for a specific directory of a website can be higher
than the number of results reported for the entire site. So, the whole thing is reportedly smaller than one of its pieces.
You would think that these discrepancies and excessive estimations are the effects of a bug, but that's not the case. The reason is, again, that the feature we are talking about has never been designed to be a counter and, more importantly, it has not been designed to provide useful, actionable information to webmasters or SEOs.
A way more precise tool to know the number of indexed resources of a website is the one provided by Google Webmaster Tools
, but it's updated very slowly and it only shows the quantity of indexed results of the entire website; if you want to monitor how many resources have been indexed in specific directories, you can't.
All these considerations lead us to the final conclusion of this post: the chart that I'm sharing is mostly useless, because the data upon which it is based is practically bogus
. You can use it to monitor a trend over time
but the actual numbers are almost meaningless.
(after my test is over I'll also share with you what I learned about the mass 410'd of those results) #seo