Shared publicly  - 
 
SEO Beginners - Clear Click Paths and Canonical URLs

A new blog post on the official Google Webmaster Blog came out today about best and worst practices for faceted navigation. This is something that is worth sharing with the developers that you work with on setting up URLs for sites that use filtering and sorting navigation to make it easier for visitors to find things on a site that provides users with a lot of options to find things they are interested on your site, and yet gives you the chance to limit the amount of URLs that might get indexed on a site.

There is a Google message that can show up in Webmaster Tools that, succinctly put, tells you that you have "too many URLs." I've seen it on sites that provide too many options for sorting and filtering content that can result in way too many URLs for the same content that has been filtered and sorted many different ways. 

I'm not sure that this is truly a "beginner" SEO topic but for the fact that understanding this topic as much as possible as you are starting is one of the best ways of having it stare you in the face in the future.

In Google Webmaster Tools is a section on URL parameters which you can use to tell Google more about the behavior of certain parameters when they show up in URLs. Some of them are easy to understand, such as the appearance of certain parameters that only exist to track visitors, such as session IDs (often shown as something like "&Sid=") which don't change the content shown on a page. I've seen these within search results but may be seeing them less). The URL parameter section of Google Webmaster Tools comes with a warning that you shouldn't mess with it unless you have a good idea about how it might impact search results. 

Another post from the Official Google Webmaster Blog that's also worth sharing with your developers and worth digging more deeply into yourself is the post that they did on common canonical link element mistakes:

5 common mistakes with rel=canonical
http://googlewebmastercentral.blogspot.com/2013/04/5-common-mistakes-with-relcanonical.html

Again, as a beginner, you may not quite be ready for that post either, but both are worth bookmarking and returning to so that when you do find yourself with questions about faceted search or canonical link elements, you will be much less likely to follow the practices that both posts warn about.

Both involve making sure that there is a clear path that search engine spiders can follow through a site that allow all of the pages on your site that you want indexed to be found by search engine spiders. That clear click path is one of the beginner topics that is important to SEO.

Before you reach the challenge of issues involving canonical link elements and faceted search, you'll see simpler issues involving things like:

Limiting spider access to canonical versions of URLs

(a) Is the home page of the site linked to consistently the same way throughout the site, within its navigation?

A home page can be linked to a number of ways, but ideally you want to be consistent. Here are some ways that you might see a home page linked to on a site:

http://www.example.com/
http://example.com/
http://www.example.com/index.htm
http://www.example.com/INDEX.htm
https://www.example.com/
http://example.com/index.htm
http://example.com/INDEX.htm
https://example.com/
http://www.example.com/homepage
http://www.example.com/home/index.htm

I could go on with a few additional permutations, but I swear I've seen sites where the main logo links to the home page with one version, in the main navigation with a different version, in a footer navigation with a completely different version, in an HTML sitemap with a completely different version, and in an XML sitemap with one or more versions as well.

Pick one version, call it the canonical (or best) version, and stick to it on the site, consistently throughout the whole site.

(b) Limit Domain Access to all pages to one

Many sites come straight from a host to you accessible at both www and non-www versions of the pages of the site. Both http://www.example.com/ and http://example.com/ resolve to what appears to be the same page. But a search engine may treat them as if they were different pages. One of the first things I do when I start a site audit is to run both versions (with and without a "www') through a header check to see if one version 301 redirects to the other.

Sometimes they won't - for example, the page at http://nytimes.com does a 302 redirect to http://www.nytimes.com. Google has stated o their help pages that redirect should ideally be a 301 redirect, and that it's likely to pass PageRank through a 301 redirect. A Google Webmaster Help forum entry had an admission from a webmaster evangelist from Google that they might sometimes treat a 302 like that as if it were a 301 if it seemed like it was intended to be permanent. It's better to do it right than it is to do it wrong and hope that Google fixes it for you.

(c) Avoid HTTPS bleedover

There's nothing quite like beginning to work on a site, and using a tool such as Screaming Frog to crawl it and the program goes on and on and on into tens of thousands of URLs only to find that every URL is repeated multiple times with one set that has "www" in it and another that doesn't, and both sets start with https once and with http another time, and when you sort them out, you find out that you only really have around 3,000 to 4,000 URLs that you really want indexed, and that you have to figure out a way to not have all the other URLs show up.

When you have legitimate https pages where it's important to use https because the pages are used by visitors to send data over such as names and addresses and credit card numbers over, there's no problem with that. If those pages use relative URLs to link to other pages on the site, then when visitors or search engine spiders follow those links, they may end up with an HTTPS protocol in front of them. Changes are that you don't want those indexed either. Making those links absolute on those pages, so that the HTTPS protocol isn't attached to those other pages is one way to avoid that kind of bleedover.

(d) Capitalization Matters with file and folder names.

When you capitalize letters within a domain name, search engines don't care. But when you have multiple pages where the letters are the same in directories and page names, but capitalization differs, it's quite possible that a search engine will treat those pages as if they are different. so "index.htm" is a different file than "Index.htm" is a different file than "INDEX.htm" and so on.

Addressing issues like these are part of the challenges that an SEO faces every day. The Google Webmaster Central blog posts on faceted navigation and on common canonical link element problems are just advanced versions of the challenges you'll face. Getting a handle on these site structure issues can be very helpful to you as you go.
23
8
Bill Slawski's profile photoAndrew Broadbent's profile photo
3 comments
 
Excellent Post +Bill Slawski you really go above and beyond of these very detailed explanations. Question did you get my small email request today, i know you are probably very busy?  
 
Hi Andrew

Thanks. I was finishing up an audit today, and working on some other issues, and I read through your email but there was a lot to it, so I didn't get a chance to consider it in too much depth. I'll take a look again a little later tonight.
Add a comment...