February 15, 2010

India Site Clinic - Part 1

It is a pleasure to bring first installment of our Site Clinic series. We've had hundreds of you asking us a variety of questions & tips on improving the way your website is shown in our search results. We'd like to say a big thank you for an amazing response from our entire search quality team. Given the huge response, we thought it would be fair to address broader trends rather than to focus on individual websites. The trend we are going to focus on today, the most prominent of the lot, is called 'canonicalization.' This was an issue with about half of the Indian websites that were submitted.

Let's start with the basics.

What is canonicalization?
The term "canonicalization" may sound a bit technical, but it is simply the process of picking the best URL when there are several choices available.

For example:
http://www.example.com/
http://example.com/
https://example.com/
http://example.com/index.html
http://www.example.com/index.html

When we crawl the web we can find a number of different addresses that sometimes show the same page. This can confuse our systems and result in taking longer to review your webpages and perhaps not showing the best URL in our search results.

Sounds good in theory? Do you want to see a real example?

Yes, freewaresnbeta.com is a nice looking blog about freewares and software. This website serves as a good example to explain the implications of the canonicalization effect. The live page http://www.freewaresnbeta.com/ loads well, but the the non-www version http://freewaresnbeta.com/ loads a page saying that the site is 'under construction.'

The 'non-www' version



The 'www' version


The above two pages look different : The non-www version is probably showing a webpage left from when the webmaster was building the site. As you notice the web addresses are so similar, users might accidentally link to the non-www one. This can lead to their visitors (and our crawlers) being sent to the wrong URL and being confused by the content shown there. When it's easy for other people to link to your proper webpages, more visitors will be able to view your pages and recommend them to their friends.

What's the solution?

There's a pretty neat facility in Webmaster Tools where you can set your preferred version (www or non-www) to be crawled by Google. Once you pick your preferred destination URL, use 301 redirects to send traffic from other URL variants to your preferred URL, so that the valuable ranking factors are carried along with the redirect. It is also advisable to use the preferred version for internal linking and advertising.

Let's talk about another important implication for canonicalization - Duplicate content.


Most web hosts consider 'www' as a default submain to the main domain and automatically configure it to have the same content as the non-www versions. This can lead us to access the website more than necessary, perhaps slowing down the website load time for normal users and also confusing users by showing the same content in the search results twice. While Google is very good at automatically detecting the best version, webmasters can help improve our accuracy by making use of the
rel=canonical link tag element as well as our URL parameter handling tool in Webmaster Tools.

In summary, here are some best practices to avoid having a canonicalization issue with your website:

  • Set your preferred version in Webmaster Tools (www or non-www).
  • Set a 301 redirect from the non-preferred version to the perferred version.
  • Have consistent internal linking (always use the preferred version to hyperlink).
  • Always make sure you advertise only the preferred version.
  • Make use of "rel=canonical" and the URL parameter handling tool in Webmaster Tools.
Check out this video of Matt Cutts, a Webspam Engineer from our team, talking about canonical link element.

If you have an opinion or any questions on the topic, please join our conversation in the Webmaster Help Group .

We hope this article and those that will follow are going to be useful, not only for the discussed websites, but also for all webmasters who read this blog.

That's all! Until our next post…!

P.S. Do keep in mind that this is not an exhaustive study, but a set of general recommendations for Google search .


Posted by Search Quality Team