Duplicate Content Definition
If you want to learn basic SEO and have a website that uses a content management system like WordPress or an ecommerce site with URL’s that look like this “http://www.example.com/category/sub-category/product.asp?pid=871&couponcode+21″ chances are you have duplicate content issues and today I’m going to give you the definition of duplicate content and elaborate on why it’s a potential problem for your website rankings.
Duplicate Content Definition
Google’s definition of duplicate content: “Duplicate content generally refers to substantive blocks of content within or across domains that either completely match other content or are appreciably similar”.
This means that duplicate content happens when Google finds two or more URL’s that virtually have the same content. When Google finds duplicate content, the algorithm will exclude those duplicate pages from their main index because Google wants to provide the best search results possible.
Why Is Duplicate Content A Problem For Users?
How would you like it if you were on Google and looked up the search term “Libya Rebels Regain Brega Town” and found that practically every search result on the first page had the same exact content? This wouldn’t be very useful for discovering different content would it? I know some users might start to think that Google search results suck and start using a different search engine (possibly a Bing/Facebook type of deal) but that’s highly unlikely. And don’t take this the wrong way, I still think Google has the best search results out there and I can only imagine how difficult it is to run that operation with a bunch of spammers polluting the web with crap, much respect to Google.
Malicious Duplicate Content Definition
Duplicate content can be on the same domain or across multiple domains like in the example above. Google primarily filters websites that copy other peoples web content and then publish it on their own sites. That scraped duplicate content (with a little help from basic SEO) can actually outrank the original content in search results which then drives traffic to spammers sites which generates advertising revenue.
Duplicate content is very difficult for Google to control algorithmically for many reasons that I won’t go into here but a good example would be to think about how crappy mainstream media really is? In that, no matter how many different news channels you flip through they all basically provide the exact same content that’s coming from AP and Reuters. This is Google’s big dilemma with duplicate content is dealing with the structure of syndication and keeping search engines based on latent semantic indexing useful and relevant for users.
Non-Malicious Duplicate Content Definition:
Duplicate content can be non-malicious in nature but can still affect your sites search engine results if you have let’s say mobile versions of your web pages, printer only versions of your web pages or ecommerce sites that use a multi-tiered hierarchy where the same product pages are found on multiple URLs.
Here’s an example of how crawlers would see the exact same product page on an ecommerce site with duplicate content on multiple URL’s:
www.example.com/category/sub-category/product.asp
www.example.com/sub-category/product.asp
www.example.com/product.asp
www.example.com/category/sub-category/product.asp?pid=871&couponcode+21
Duplicate Content on WordPress
Content management systems such as WordPress and Drupal also produce duplicate content issues by creating an infinite number of (status “200 OK” URL’s) pages that all have the same content on them. So lets say I chose 3 categories for a particular post I just published and named them Cat1, Cat2, Cat3. Using WordPress, you end up with the exact same duplicate content on the following URL’s:
www.example.com/Cat1/mypost
www.example.com/Cat2/mypost
www.example.com/Cat3/mypost
www.example.com/blog/
www.example.com/blog/mypost
Those URL’s were just a few examples of the duplicate content that would be created on WordPress, Drupal, etc.
Duplicate Content and SEO
When you have duplicate content issues Google will usually filter duplicate content and rank only the original document. One problem for SEO’s when you have duplicate content like this is that any links that point to the duplicate pages that get excluded from Google’s search engine results will no longer 100% benefit the original. If you’re not checking for duplicate content you can end up splitting your link connectivity metrics, diluting your content, diluting the overall impact of your internal linking strategy and inevitably hurting your search engine rankings.
Every situation is unique and sometimes you might even benefit from duplicate content. Check out my article on The Benefits Of Duplicate Content
Definitive WordPress Duplicate Content Experiment
Duplicate Content WordPress Example: Copy and paste the sentence below (which is the first sentence from a post on SEOmoz) and do a search on Google for it:
Duplicate content in SEO has been around for quite some time and even if Google has been saying they have been getting smarter and smarter in figuring out the best page to display in the SERPS from a list of duplicate content pages.
Below are SEOmoz rankings in Google for it’s own content: Notice how scraper sites are outranking the original post (which is www.seomoz.org/blog/duplicate-content-block-redirect-or-canonical)
#6 www.seomoz.org/blog/popular/past-90-days (obviously not the original post)
#12 www.seomoz.org/blog?page=3
I hope this little experiment conveyed how Google still has a hard time with duplicate content and determining which site posted the original content first. It also implies that the QDF (quality deserves freshness) algorithm and real-time search results are probably the main culprits for scraper sites outranking the original content at this time.
As Google keeps changing it’s algorithm to fight duplicate content (Google’s recent Farmer update) and you have some of these duplicate content issues read my post on How To Fix Duplicate Content.



March 5, 2011 










No comments yet... Be the first to leave a reply!