Google Canonical URL Problems (and Link Disappearance from SERP-s)

FINAL UPDATE: this turned out to be a false alarm! do not panic!! self-referential canonical link is fine, and doesn’t cause problems. You can still read the post and comments below to see how I got confused.


After solving a problem for a client, I looked into newest release of the WordPress plugin All-in-One-SEO-Pack, and have found a bug in it which caused the same problem that client had on a plain HTML site. It implements incorrectly recently invented Google Canonical link rel attribute, which results in pages being excluded from the index.

http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=139394

It should not be implemented on the valid pages that you want to have indexed, but only on redirects…

My prediction is that people’s websites will start disappearing from Google index due to incorrect use of this attribute.

Although Google claims that ‘Our algorithm is lenient: We can follow canonical chains…‘, (http://googlewebmastercentral.blogspot.com/2009/02/specify-your-canonical.html) which may be true for proper implementation, I think they should have foreseen this common problem.

Anyhow, I bet if you use this tag improperly, you won’t be able to find your pages indexed in Google — pages which have something like this in the HEAD section:

<link rel="canonical" href="http://www.example.com/my-current-page-being-indexed/" />

or something like this in case its an index.php/html/htm… type of page:

<link rel="canonical" href="http://www.example.com/" />

If you do find them, look at cached version, and you will see its an older version of the page you had before you implemented this attribute.

Update: even this page didn’t get indexed in Google as All-In-One-SEO-Pack is active on this blog. All you can find in Google is the url with anchor text as title, and description surrounding the link on the main blog page.

Update2: following up on the first comment below from, here is a screenshot:

google-canonical-missing-cache

UPDATE 3: read the comments below to see what happened when canonical was removed and reintroduced to this blog.

Share and Enjoy:
  • Digg
  • del.icio.us
  • NewsVine
  • Reddit
  • Slashdot
  • Sphinn
  • StumbleUpon
  • Technorati

Category: SEO Tips | Tags: , , , , 8 comments »

8 Responses to “Google Canonical URL Problems (and Link Disappearance from SERP-s)”

  1. Michael Torbert

    Looks to me like your page is in Google.

    http://www.google.com/search?hl=en&client=safari&rls=en-us&q=site%3A+seolutions.net+canonical&btnG=Search

  2. laki

    yes it is, but content of the page is not indexed — there is no cache of this page. (it will appear in the future as i disabled canonical since)

    google places into its index url and anchor text when the page is disabled from being indexed. google doesn’t have to even access the page to store it in the index — look at section 2.2 of their original paper:

    http://infolab.stanford.edu/~backrub/google.html

  3. laki

    ps. as of now, the only missing ‘cache’ from index of all blog pages is the latest post with canonical attribute:

    http://www.google.com/search?q=site%3Aseolutions.net%2Fblog%2F

  4. laki

    i guess google is not being straightforward about what it does with pages and how it interprets its rel attributes:

    http://www.google.com/search?hl=en&q=site%3A+seolutions.net+canonical+%22pages+which+have+something%22&btnG=Search

    it does seem to have index of the content, but doesn’t want to show it in ‘cache’.

    anyhow, anyone who has done more experiments with this attribute is welcome to comment.

  5. laki

    another interesting thing: after seeing result in blog search: http://blogsearch.google.com/blogsearch?hl=en&client=safari&rls=en-us&num=100&q=site%3Aseolutions.net%20canonical%20%22pages%20which%20have%20something%22&um=1&ie=UTF-8&sa=N&tab=wb

    i started wondering if google is indexing RSS content without accessing the site?

  6. laki

    if true, then how does google know about ‘canonical’ attribute.

    the more i think about this thing, the more i am confused.

    :(

  7. laki

    the confusion goes on…

    since i removed ‘canonical’, cached page appeared, but now i see ‘canonical’ link in it!

    i have now reactivated ‘canonical’ in the blog to see what will happen next.

  8. laki

    So this seems to be a partial explanation of changes in google index described above:

    Google does index the feed first, and until google also indexes the blog post, there won’t be a ‘cache’ version of the page in search engine results

    http://groups.google.com/group/google-blog-search/browse_thread/thread/8244fc8731f47970?pli=1

    Canonical urls can be self-referential, so this whole post is (luckily) a false (but well intent) alert!

Back to top

     

Improve the web with Nofollow Reciprocity.