Google Canonical URL Problems (and Link Disappearance from SERP-s)
FINAL UPDATE: this turned out to be a false alarm! do not panic!! self-referential canonical link is fine, and doesn’t cause problems. You can still read the post and comments below to see how I got confused.
After solving a problem for a client, I looked into newest release of the WordPress plugin All-in-One-SEO-Pack, and have found a bug in it which caused the same problem that client had on a plain HTML site. It implements incorrectly recently invented Google Canonical link rel attribute, which results in pages being excluded from the index.
http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=139394
It should not be implemented on the valid pages that you want to have indexed, but only on redirects…
My prediction is that people’s websites will start disappearing from Google index due to incorrect use of this attribute.
Although Google claims that ‘Our algorithm is lenient: We can follow canonical chains…‘, (http://googlewebmastercentral.blogspot.com/2009/02/specify-your-canonical.html) which may be true for proper implementation, I think they should have foreseen this common problem.
Anyhow, I bet if you use this tag improperly, you won’t be able to find your pages indexed in Google — pages which have something like this in the HEAD section:
<link rel="canonical" href="http://www.example.com/my-current-page-being-indexed/" />
or something like this in case its an index.php/html/htm… type of page:
<link rel="canonical" href="http://www.example.com/" />
If you do find them, look at cached version, and you will see its an older version of the page you had before you implemented this attribute.
Update: even this page didn’t get indexed in Google as All-In-One-SEO-Pack is active on this blog. All you can find in Google is the url with anchor text as title, and description surrounding the link on the main blog page.
Update2: following up on the first comment below from, here is a screenshot:
UPDATE 3: read the comments below to see what happened when canonical was removed and reintroduced to this blog.
related:
Category: SEO Tips | Tags: canonical, google, index, problems, SEO Tips 8 comments »
April 6th, 2009 at 11:13 pm
Looks to me like your page is in Google.
http://www.google.com/search?hl=en&client=safari&rls=en-us&q=site%3A+seolutions.net+canonical&btnG=Search
April 6th, 2009 at 11:26 pm
yes it is, but content of the page is not indexed — there is no cache of this page. (it will appear in the future as i disabled canonical since)
google places into its index url and anchor text when the page is disabled from being indexed. google doesn’t have to even access the page to store it in the index — look at section 2.2 of their original paper:
http://infolab.stanford.edu/~backrub/google.html
April 6th, 2009 at 11:33 pm
ps. as of now, the only missing ‘cache’ from index of all blog pages is the latest post with canonical attribute:
http://www.google.com/search?q=site%3Aseolutions.net%2Fblog%2F
April 7th, 2009 at 12:05 am
i guess google is not being straightforward about what it does with pages and how it interprets its rel attributes:
http://www.google.com/search?hl=en&q=site%3A+seolutions.net+canonical+%22pages+which+have+something%22&btnG=Search
it does seem to have index of the content, but doesn’t want to show it in ‘cache’.
anyhow, anyone who has done more experiments with this attribute is welcome to comment.
April 7th, 2009 at 12:17 am
another interesting thing: after seeing result in blog search: http://blogsearch.google.com/blogsearch?hl=en&client=safari&rls=en-us&num=100&q=site%3Aseolutions.net%20canonical%20%22pages%20which%20have%20something%22&um=1&ie=UTF-8&sa=N&tab=wb
i started wondering if google is indexing RSS content without accessing the site?
April 7th, 2009 at 12:19 am
if true, then how does google know about ‘canonical’ attribute.
the more i think about this thing, the more i am confused.
April 8th, 2009 at 11:29 am
the confusion goes on…
since i removed ‘canonical’, cached page appeared, but now i see ‘canonical’ link in it!
i have now reactivated ‘canonical’ in the blog to see what will happen next.
April 8th, 2009 at 12:01 pm
So this seems to be a partial explanation of changes in google index described above:
Google does index the feed first, and until google also indexes the blog post, there won’t be a ‘cache’ version of the page in search engine results
http://groups.google.com/group/google-blog-search/browse_thread/thread/8244fc8731f47970?pli=1
Canonical urls can be self-referential, so this whole post is (luckily) a false (but well intent) alert!