The Cross-Domain Canonical Feature – Are Spammers Abusing it or not?

The Cross-Domain Canonical Feature – Are Spammers Abusing it or not?

I am using the Cross Domain Canonical feature which was introduced by Google in December of 2009. Since it is a recommended practice for Duplicate Content on different domains I have to, i don’t have another choice. But several unique uses might apply, I decided to use it as suggested and put it on one of my company’s big sites which owns 2 Domains targeting different countries, for the company’s protection I am not listing the site here

But one domain is being targteted by GEO settings in  the Google Search Console to a specific country www.domaincopy.com and the other is being targeted globally without any specifics www.domain.com this one can also be referred to as the “Brand” or “Original” one.

Here is what Google said in 2009 about cross-domain-canonical use:

Webmaster level: Intermediate

We’ve recently discussed several ways of handling duplicate content on a single website; today we’ll look at ways of handling similar duplication across different websites, across different domains. For some sites, there are legitimate reasons to duplicate content across different websites — for instance, to migrate to a new domain name using a web server that cannot create server-side redirects. To help with issues that arise on such sites, we’re announcing our support of the cross-domain rel=”canonical” link element.

Ways of handling cross-domain content duplication:

  • Choose your preferred domain
    When confronted with duplicate content, search engines will generally take one version and filter the others out. This can also happen when multiple domain names are involved, so while search engines are generally pretty good at choosing something reasonable, many webmasters prefer to make that decision themselves.
  • Enable crawling and use 301 (permanent) redirects where possible
    Where possible, the most important step is often to use appropriate 301 redirects. These redirects send visitors and search engine crawlers to your preferred domain and make it very clear which URL should be indexed. This is generally the preferred method as it gives clear guidance to everyone who accesses the content. Keep in mind that in order for search engine crawlers to discover these redirects, none of the URLs in the redirect chain can be disallowed via a robots.txt file. Don’t forget to handle your www / non-www preference with appropriate redirects and in Webmaster Tools.
  • Use the cross-domain rel=”canonical” link element
    There are situations where it’s not easily possible to set up redirects. This could be the case when you need to move your website from a server that does not feature server-side redirects. In a situation like this, you can use the rel=”canonical” link element across domains to specify the exact URL of whichever domain is preferred for indexing. While the rel=”canonical” link element is seen as a hint and not an absolute directive, we do try to follow it where possible.

Still have questions?

Q: Do the pages have to be identical?
A: No, but they should be similar. Slight differences are fine.

Q: For technical reasons I can’t include a 1:1 mapping for the URLs on my sites. Can I just point the rel=”canonical” at the homepage of my preferred site?
A: No; this could result in problems. A mapping from old URL to new URL for each URL on the old site is the best way to use rel=”canonical”.

Q: I’m offering my content / product descriptions for syndication. Do my publishers need to use rel=”canonical”?
A: We leave this up to you and your publishers. If the content is similar enough, it might make sense to use rel=”canonical”, if both parties agree.

Q: My server can’t do a 301 (permanent) redirect. Can I use rel=”canonical” to move my site?
A: If it’s at all possible, you should work with your webhost or web server to do a 301 redirect. Keep in mind that we treat rel=”canonical” as a hint, and other search engines may handle it differently. But if a 301 redirect is impossible for some reason, then a rel=”canonical” may work for you. For more information, see our guidelines on moving your site.

Q: Should I use a noindex robots meta tag on pages with a rel=”canonical” link element?
A: No, since those pages would not be equivalent with regards to indexing – one would be allowed while the other would be blocked. Additionally, it’s important that these pages are not disallowed from crawling through a robots.txt file, otherwise search engine crawlers will not be able to discover the rel=”canonical” link element.

We hope this makes it easier for you to handle duplicate content in a user-friendly way. Are there still places where you feel that duplicate content is causing your sites problems? Let us know in the Webmaster Help Forum!

I realised that the domain which was being targeted to the specific country is the copy www.domaincopy.com of the other one which was in our eyes and definitely also in Google’s eyes the Brand and Original version www.domain.com. Also, I separate the Organic traffic on both domains to see how much they both get.

After having the sites up and running for over 2 years now without interruption, I can tell you that in the targeted country, the copy www.domaincopy.com of the original domain ranks highest for the same Brand and the Original domain also ranks on the 1st page, making the Results for the Brand filling up the first page with both www.domaincopy.com and www.domain.com.

Here is where I never thought about the Spammers

People could abuse the system and create many domains with the exact same copy content and just add the cross domain canonical tag which looks just as correct to Google as to any other Person.

So this is exactly what mjam.net did, they are filling up the first page results and 2nd page and 3rd page for local searches with their copy sites which have a slight change in Design, let’s have a look at our target sites:

http://www.pizzaportal.at/wien/1010/http://www.willessen.at/wien/1010/http://www.mjam.net/wien/1010/http://www.netkellner.at/wien/1010/

A Search for the simple term “Lieferservice 1010 Wien” quickly unveils and shows us the reality of the situation regarding the same-content and how Google ignores the fact that those are almost same-site copies, see here: https://www.google.at/#q=lieferservice+1010+wien

lieferservice 1010 wien - Google Search

 

As long as Google keeps the Guidelines as they are and show that this is the way to go, i respect their risk and I am sure that in the end the original site, in this case Mjam will remain active in the SERPs.

Here is a Video by Matt Cutts saying it’s safe: