moderator

General Tech Talk  - 
 
Sharing a recent areticle from +Pete Meyers on rel=canonical and adding some insight.

Point #10 Are Non-Canonical Pages Indexed?
For all practical purposes – no. If Google honors a rel=canonical tag, then the non-canonical page is not eligible for ranking. It will not have a unique cached copy, and it will not appear in the public index via a “site:” search. Now, does Google maintain a record of the non-canonical URL? I assume they do. As an SEO, though, the non-canonical URL ceases to exist in any meaningful way.

Google seems to diminish the weight of the canonical if you have cases where it has been misapplied. Example: product page 1 is replaced by a new updated page 2 with nearly the same content. Each page is kept live and has a canonical pointing to itself. You're created conflicting canonical tags for essentially the same content. Since Google cannot tell which page should be indexed, it indexes both and neither will perform well. For this to occur, it needs to happen in scale however, random errors are likely not a huge concern.  

If there are a high number of errors in a specific directory, it appears that Google is smart enough to isolate diminishing the value of the canonical to that directory so properly implemented directories perform as designed.   
13
1
John Mueller's profile photoPete Meyers (Dr. Pete)'s profile photoAris Vrakas's profile photoEdmondo Antonacci's profile photo
19 comments
 
"Are Non-Canonical Pages Indexed?"    That, to me, is a poorly worded question.     I have seen many pages, that do not have rel="Canonical" on the page, indexed and rank quite well.     The question should have been something like, Does a page,  that has rel="Canonical" pointing to another page,  still get indexed?

Even that question is not quite there.   Of course it gets indexed.    If not, how in the heck is G supposed to know what to do with the Canonical?    

So now maybe the question should be    "Does a page,  that has rel="Canonical" pointing to another page,  show up in the SERPs?
 
+Trey Collier I think "indexed" in the sense most people mean it, refers to visibly indexed (cached and viewable with "site:"), which is why I tried to state that. I agree, though, that the page is certainly stored by Google and is part of their broader index of the web. I just don't feel that that distinction is helpful for the average SEO or webmaster.

Agreed that non-canonical was a short, but potentially confusing choice of terminology. My implication was "the non-canonical page in a set of pages with rel=canonical applied". You're correct that non-canonical pages that are poorly handled or not handled at all rank all the time in the wild.
 
I also agree with +Rick Bucich that mixed signals can end in the tag being ignored and pages re-entering (or staying in) the visible index. Each of these answers could be a post in themselves, and indexation is incredibly situational. On large sites, I've almost never had an implementation go by the book, honestly.
 
I'm not trying to be difficult, but most people don't know the difference of indexed vs being visible in SERPs.       They don't have a clue as to what is the difference.    It should be goal of those in the SEO to clear that up for them.    

Indexed, at least to me, is that you have been crawled and put into a database.   Being Visible in the SERPs is a whole separate issue. 

Other than some possible confusing semantics, very nice article Pete.   :)
 
Yeah, no argument on the confusion. My gut feeling, from Q&A especially, is that when people say "indexed" in our community, they mean indexed in a way they can see. The problem is that anything else is kind of a black box. We know Google has to add the page to the index, or else they wouldn't know about the canonical and would keep re-discovering it, but I've seen a lot of people who think that a page with a canonical tag to another page is still eligible for ranking and shows up in the public index. One I go through the comments, I'll see if there's a better way to say this.
 
The fact that experienced SEOs can disagree on this stuff just illustrates how tough it is for the average webmaster, IMO.
 
...And next to impossible for us beginners
 
Add into this mix the transparency of Google....and you have mud!  LOL
 
If Google Glass was as transparent as Google's statements about the algorithm, everyone would walk off the curb to their death.
 
An unbelievable amount of content remain "indexed" if you're searching for it specifically but the non-canonical being served to searchers is symptomatic of a problem.

I have strong circumstantial evidence of Google diminishing the value of our canonical tags within two of our directories. Those directories also have the greatest issue with conflicting canonical tags on near duplicate pages.

In one case we were just orphaning the old url and updating the crawl path to the new page. Both stayed live, were near identical and had conflicting canonical tags. Over time, the amount of duplication built up to the point where rel=canonical lost it's strength.

It became more obvious within those directories in cases where the canonical had been deployed correctly but the incorrect url was being displayed in cache AND being served at least temporarily to searchers.

Wish I could elaborate with details.
 
+Rick Bucich On very large sites, I've seen Google ignore many on-page signals - canonical, Meta Robots, Robots.txt, etc. It's amazing how complicated this stuff gets in the wild. More than once, when I was doing client work, I went in with one tactic and had to shift gears, even though that same tactic had worked great on another site.
 
+Pete Meyers I'm dealing with a site with well over 50 million pages...and some bad habits:) 
 
FWIW we can and do index pages that have a rel=canonical pointing to other URLs. It's not a server-side redirect where we never see content, it's not a noindex robots meta tag. 
 
Thanks, +John Mueller. So, for the average webmaster and semantics, though - what do you call it when a page is in the Google index, but it isn't cached and doesn't appear under queries with the "site:" operator? For all intents and purposes, it's not in the public "index", IMO. Is it effectively a filter?
 
I'm trying to bridge that gap between precise language and explaining it in a way that's useful to a typical SEO/webmaster trying to implement rel=canonical.
 
Those pages can show up in site:-queries and normal rankings.
 
Can, yes, but in my experience they often don't, if Google honors the canonical. Not trying to be argumentative, just to reconcile what I've seen in the "wild". When I've used rel=canonical to handle large-scale duplicates, I've been able to successfully use "site:" to see the index shrink over time.
 
1. Pete is right that canonical is the fastest way to deal with duplication within but also between mega-sites of the same stable
2. Canonicalised pages ARE being indexed and there are very practical reasons why this is important. Mega sites receive an enormous amount of indexation from Google and having dead wood indexed, especially of platforms you don't maintain and monitor has the negative effect of:
2.1: Google spends too much time on dead wood and indexed your canonical pages slower and less frequently
2.2: If your old canonicalised pages have an issue you're not monitoring - like latency - that affects your entire domains latency score. 
Add a comment...