Web Site Design by Rainbo Design

Preventing & Repairing Canonicalization Errors

Canonicalization is the process by which search engines determine the best, or "canonical" version of a single page that can be retreived using more than one URL, and for different URLs that retrieve essentially identical content. A long time ago, server software makers, ISPs and Web-hosting companies started to configure their servers in a manner that was meant to be a convenience for their users, but which can turn into a major problem for search engine rankings. These well-intentioned people decided they would allow websites to be accessed with or without the very common "www" subdomain prefix in the URL. This was often a very handy contrivance for budding webmasters and new Internet users who would often omit the three-character prefix when they typed in a website address in their browser. But it has come to present a particular problem in Google. Canonicalization is also an issue for all search engines when it comes to duplicate content and other situations. More....

website mechanic graphic

When Your Car Is Broken,
You Call A Mechanic

My SEO Tips can help you in many ways, but if your website is "broken" in the search engine results, and you can't tell how to fix it yourself, you should call in a specialist because he's trained for the job. He knows what's wrong and how to fix it. Doing it yourself is not always the wisest choice. You can end up wasting time and losing sales by taking weeks or months trying to learn what I already know about search engines.

Be smart and start getting more targeted traffic now by ordering my Search Engine Optimization Service today! You can also request my FREE Search Engine Evaluation. I'll tell you what you're doing right, what you're doing wrong, and how I can help your site get its fair share of search engine traffic.



Preventing & Repairing WWW Domain Name Canonicalization Errors

Let's start with the canonicalization issue that is almost exclusively limited to Google. By the strictest definitions, the two URLs "http://yoursite.com" and "http://www.yoursite.com" are separate and distinct entities. The first is technically pointed to the domain's root directory, and the second is pointing to a subdomain named "www". In the earliest days of the World Wide Web, the contents of a website were actually stored in a directory named with the standard abbreviation "www" by convention. Thus, the common practice of making a website's URL begin with that prefix was born. But as the Internet became more popular, and webmasters and IT managers dictated making allowances for the less techno-savvy in the population, various shorthand methods crept into usage. The one we deal with here is the making of the www prefix optional. I'm sure it seemed a natural thing to do. When referring to a website by its URL, the "www" part is frequently omitted both in speech and in writing, so it was only logical that users would similarly take the same shortcut when they went online. So, rather than frustrate those users needlessly, servers were configured to allow either version to retrieve the same content. Users were happy, IT managers were happy, and webmasters were happy. But being the product of computer-based logic, search engine algorithms often fail to understand when they should treat these two URLs as one and the same. Google has remained particularly stubborn about this issue, despite overwhelming evidence of the problems it causes. They even have a page in their support section that deals with it. Google now provides a method for webmasters to select a Preffered Domain in their Google Webmaster Tools. But this tool is only for Google and you should still install the 301 redirect, if at all possible. If you can't install a 301 redirect, there are other solutions like a <meta> refresh tag and the new rel="canonical" tag.

The problem is two-fold. First, there is the issue of link popularity. Google's vaunted PageRank system depends on links and it will not always canonicalize (ie. treat as identical) URLs in links that omit the www and the version that includes it. This often means lower rankings for the site for most searches than it actually deserves. Second, and a frequent result of the first, Google won't deep crawl one version of the URL or the other based on either (a) the reduced link popularity/PageRank, or (b) duplicate content issues. Having the same content available from more than one URL is a violation of the guidelines of all major search engines and this www issue is one of the most common causes of canonicalization problems in Google. Fewer pages logged for a site means that once again, one version of the URL is not receiving proper link popularity credit for its own internal links.

So the problem compounds itself over time, and can be especially debilitating to sites that weren't all that strong to begin with. Sadly, webmasters are often partially responsible for this problem because, knowing they can "get away with it", they will use the shorthand version when submitting their site to directories or posting links on webpages of their own design. Once this Genie is out of the bottle, its a long battle to overcome because even if you are able to find every incorrect link on your own site, all it takes is a mal-formed link on an obscure page that doesn't show up in Google's "link:" command to keep this demon haunting you forever. Fortunately, there is a solution.

The solution is to use server control methods to automatically redirect requests to the proper URL. The server must return a "301 Moved Permanently" result code in order for the search engines to properly assign the link popularity and to update their internal records of the page's true URL.

Websites running on hosts that use the Apache server software usually have it the easiest in this regard because they can control this problem on their own using the .htaccess control file. Just create a simple text file named ".htaccess" with no filename extension, and insert the following command:

RewriteEngine On
RewriteCond %{HTTP_HOST} ^yoursite.com$
RewriteRule ^(.*)$ http://www.yoursite.com/$1 [R=301,L]

Simply replace "yoursite.com" in the above code with your website's domain name. Websites based on Microsoft's IIS Server Software will need to consult their system administrator for help. Again, be sure the server returns the redirecting result code #301 or you're only papering over the problem and not repairing it. A code 302 result is not acceptable because 302 means "Moved Temporarily" and doesn't repair canonicalization problems.

You can check the code your site returns with my Server Header Checker.

Canonicalization Problems with Session IDs & Dynamic URLs

The www issue is only one place where canonicalization problems occur. Anytime the search engines encounter a page that is essentially identical to another page, they will try to select the best, or "canonical" version, and filter any duplicates from their index. As with the www issue, this can hurt your site's performance in the search engines. The proliferation of BLOGs and other content management systems has brought canonicalization problems to many websites because those programs routinely create multiple URLs that point to the same content, resulting in canonicalization issues. The search engines are becoming more adept at detecting and dealing with the most common canonicalization problems in BLOGs and forums, but it's up to the individual webmaster to take steps to prevent the problem from arising in the first place. Fortunately, most BLOGs are supported by a community of talented programmers who have created add-ons for BLOGs that can reduce the number canonicalization problems.

Ecommerce websites have their own problems with canonicalization. Many shopping cart programs require users to accept cookies in their browser or they will add what are called "Session IDs" to every link. Since search engine crawlers don't accept cookies, they have traditionally avoided crawling any URL that included a Session ID or other user indentification value. Another place where ecommerce sites can create canonicalization issues is when they use features like sorting lists of products by price, color, or size, etc. The search engines see these pages containing nearly identical content and suppress them. Fortunately, two of the major search engines - Google and Yahoo! - now provide tools for webmasters to manage these problems involving dynamic URLs. Naturally, you need to register and verify your site in order to use these tools. Assuming you've already done so, here's how they work:

In Google's Webmaster Tools console, you can tell Google to ignore parameters in query strings, such as session IDs. Click on "Site Configuration", then "Settings", and you'll see a section titled, "Parameter Handling". Click on "Adjust parameter settings". You'll see a text box labeled "parameter name". Enter the name your site gives to the parameter for your session ID (for example, osCommerce uses "oscSid"). Then choose "Ignore" from the drop-down menu titled, "Action". Soon, Google will filter out that parameter from the URLs for your site, and will start to properly index any URLs that would have caused a problem in the past.

Yahoo! Site Explorer has a similar tool. Select your site from the "My Sites" list. Then click on "Actions", followed by "Dynamic URLs" in the menu on the left. Enter the parameter name in the appropriate text box, and choose "Remove From URLs". This will have the same effect as the Google tool. It will begin to filter the named parameter from the URLs it encounters for your domain.


Repairing Canonicalization Without Redirects

Many webmasters don't have access to server redirect tools like Apache's .htaccess file, so they can't install conventional redirects to solve canonicalization problems. Fortunately, there is a simple alternative.

In February 2009, the major search engines gave all webmasters a very powerful and easy-to-use method of preventing and repairing canonicalization tools. The four largest search engines: Google, Yahoo!, MSN, and Ask.com have all agreed to support a new canonicalization attribute for the <link> tag that goes in the <head> section of your HTML documents. The syntax is as follows:

<link rel="canonical" href="http://www.yoursite.com/" />

This tag will be used as "a very strong hint" in determining the canonical version of a URL within a single domain. It is treated as a 301 redirect for such purposes. For more information, see the Google Webmaster Blog post: Specify Your Canonical, and Matt Cutts' article: Learn About The Canonical Link Element in 5 Minutes. Both are well worth reading, but Matt Cutts really explains the impact on rankings and ideas for when it's appropriate to take action.

Site owners who operate multiple websites for a single company or organization face the issue of the best way to deal with duplicate content on pages that contain information like contact details or terms of use that are common to all of their websites. Overall, there is no reason to worry about duplicate content for these pages since they are rarely pages that need to rank well. You should simply provide a clear navigation path for users who are looking for such information in the normal design of your website, and let your other pages carry the burden for ranking issues.



If you want your site to rank higher in the search engines, my Search Engine Optimization Services can give your website what it needs to get your fair share of search engine traffic quickly, without disturbing your design, and without breaking your budget.

Search Engine Optimization Tips Main Page




Call Richard L. Trethewey in Minneapolis today at 612-408-4057 from 9:00 AM to 5:00 PM Central time to get started on your new website design package or search engine optimization program today!


Search Engine Marketing and Optimization Services

Rainbo Design Sitemap

Minneapolis Search Engine Marketing homeAffordable Custom Website Design and Search Engine Marketing by Rainbo Design Main Page