Controlling Duplicate Content On Your Website
Ever since Google rolled out its Panda algorithm change back in February 2011 webmasters have been tackling duplicate content issues across their websites. The panda update was massive, hitting around 12% of global search engine queries and many saw an instant drop in rankings which in turn had a catastrophic effect on site traffic volumes.
Google will always aim to promote websites which offer quality, unique content that they feel is of significant value to its users, and content that can be found across multiple websites simply no longer fits the bill. Think about it, if you are sitting an exam and you directly copy the answers of the person sitting next to you is the grade you receive a true justification of your efforts to gain a top mark? The simple answer is no.
Google will examine all of the web pages that carry a particular piece of duplicate content and try to figure out which one is the original so it can rank it appropriately in the search results.
The option of simply re-writing the content on your site is not always the best method to resolve duplicate content issues and thankfully we do have a number of alternative methods at our disposal to notify Google of a page that carries duplicate content.
The robots.txt file
The robots.txt is a file that sits in the root of your website and it is the first file that web bots try and access when they come to crawl your website. It is basically a list of instructions for the bots that informs exactly where they should and shouldn’t go on your website.
For many years webmasters have used the robots.txt file to block Google bots from accessing pages with duplicate content on them and its very debatable as to whether this is the best method to adopt. Firstly, never attempt to block a page that is an integral part of your website for user experience, for example, if you are an ecommerce site with a range of products for sale and one of those product pages carries duplicate content, blocking this URL in the robots.txt file would disallow the search engines from accessing this page.
Secondly, any links that are pointing to a page that is blocked by the robots.txt file will get ignored by the search engines, this could potentially de-credit some of your most valuable links to your site and in turn this will do more harm than good in the search engine rankings.
The No Index Meta Robots HTML Tag
<meta name="robots" content="noindex, nofollow" />
The meta robots tag is a small piece of HTML code that you insert into the <head> section of your website and it informs Google that you do not want it to index that particular page, the value of this is you can inform Google not to index the page however the links pointing out of that page you can then determine whether Google treats them as follow or nofollow.
Most SEO’s now prefer this method over the robots.txt file restriction.
<link rel="canonical" href="http://www.yourdomain.com/products.php?item=product-a" />
URL of page A – http://www.yourdomain.com/products.php?item=product-a
URL of page B – http://www.yourdomain.com/products.php?item=product-a&type=type1
Many ecommerce websites struggle to control their duplicate content, especially with vast amounts of varying dynamic product, category and sort pages. The idea behind the canonical tag is that you can specify your preferred page by identifying the duplicated pages on your website. You simply add the canonical tag in the <head> section of the page that carries the duplicate content with the link inside the canonical tag pointing back to the page carrying the original content.
301 redirects can be used as a permanent measure to redirect multiple duplicate content pages to one page that contains the original content. Many webmasters use 301 redirects to redirect www, non www, and http versions of a webpage. For example, Google sees http://example.com and www.example.com as two different websites, to combat this we then use a 301 redirect to redirect http://example.com to www.example.com or vice versa. A 301 redirect is best used when user experience is not really affected by implementing a redirect.
To adopt a successful search engine optimisation campaign, duplicate content issues need to be resolved from the get go or they will continue to hamper your website’s progress in the search engine results, the above methods are alternative solutions to completely re-writing content on your website to eradicate your duplicate content issues.