India Site Clinic - Part 1
Let's start with the basics.
What is canonicalization?
The term "canonicalization" may sound a bit technical, but it is simply the process of picking the best URL when there are several choices available.
For example:
http://www.example.com/
http://example.com/
https://example.com/
http://example.com/index.html
http://www.example.com/index.html
When we crawl the web we can find a number of different addresses that sometimes show the same page. This can confuse our systems and result in taking longer to review your webpages and perhaps not showing the best URL in our search results.
Sounds good in theory? Do you want to see a real example?
Yes, freewaresnbeta.com is a nice looking blog about freewares and software. This website serves as a good example to explain the implications of the canonicalization effect. The live page http://www.freewaresnbeta.com/ loads well, but the the non-www version http://freewaresnbeta.com/ loads a page saying that the site is 'under construction.'
The above two pages look different : The non-www version is probably showing a webpage left from when the webmaster was building the site. As you notice the web addresses are so similar, users might accidentally link to the non-www one. This can lead to their visitors (and our crawlers) being sent to the wrong URL and being confused by the content shown there. When it's easy for other people to link to your proper webpages, more visitors will be able to view your pages and recommend them to their friends.
What's the solution?
There's a pretty neat facility in Webmaster Tools where you can set your preferred version (www or non-www) to be crawled by Google. Once you pick your preferred destination URL, use 301 redirects to send traffic from other URL variants to your preferred URL, so that the valuable ranking factors are carried along with the redirect. It is also advisable to use the preferred version for internal linking and advertising.
Let's talk about another important implication for canonicalization - Duplicate content.
Most web hosts consider 'www' as a default submain to the main domain and automatically configure it to have the same content as the non-www versions. This can lead us to access the website more than necessary, perhaps slowing down the website load time for normal users and also confusing users by showing the same content in the search results twice. While Google is very good at automatically detecting the best version, webmasters can help improve our accuracy by making use of the rel=canonical link tag element as well as our URL parameter handling tool in Webmaster Tools.
In summary, here are some best practices to avoid having a canonicalization issue with your website:
- Set your preferred version in Webmaster Tools (www or non-www).
- Set a 301 redirect from the non-preferred version to the perferred version.
- Have consistent internal linking (always use the preferred version to hyperlink).
- Always make sure you advertise only the preferred version.
- Make use of "rel=canonical" and the URL parameter handling tool in Webmaster Tools.
If you have an opinion or any questions on the topic, please join our conversation in the Webmaster Help Group .
We hope this article and those that will follow are going to be useful, not only for the discussed websites, but also for all webmasters who read this blog.
That's all! Until our next post…!
Posted by Search Quality Team