Duplicate content: Separating the penalty from the filter
Finally, sort through and understand whether duplicate content is penalized by the search engines or just filtered in the search results.
Several weeks ago at SMX West I had the pleasure of meeting and having lunch with Brian White from Google. White works on Matt Cutts' Web spam team, tirelessly working to make Google's search results the best they can be, ensuring the best user experience. Quite a hefty task indeed.
You'd think that someone who spends his days fighting the never-ending battle that is Web spam might be a bit negative or jaded. If that is the case, he does an amazing job hiding it. Instead, he was upbeat and you could feel the excitement in his voice as he spoke. Here's a guy who loves what he's doing and truly wants to not only improve the searchers' experience on Google, but wants to make the Web a better place. You can't help but like a guy who's fighting the good fight.
During the "Lunch with a Googler," those at the table had an opportunity to ask questions and to share their stories. In spite of the noise in the expo hall where lunch was being served, the conversation was great. There were a number of interesting questions that were presented, but one stood out that I was initially surprised with. The question was essentially about dealing with "duplicate content penalties."
I was actually reminded of this the other day when I signed up to sample an SEO tips e-mail and one of the tips was addressing this same "penalty." When those who speak on SEO call this a penalty, it's no wonder that Webmasters and even in-house SEO practitioners continue to struggle with this.
During our lunch when this question came up, White tried to address it and I even hopped in to lend a hand. Perhaps this is confusing because it is an important issue to be aware of, but there are some basic misunderstandings that continue to trip people up. So let's break this issue down into bite-size chunks.
Duplicate content penalty vs. filter
First, before we break duplicate content out into its various flavors, let's take care of the penalty issue. This is probably the biggest area of confusion. Unfortunately, when the issue around duplicate content was first discussed, it was referred to as a penalty and there are still many blog posts, forum threads, and pages that refer to this as a penalty.
Duplicate content doesn't really fall under the classification of a penalty--unless perhaps you are trying to spam the search engines with an exorbitant amount of duplicate pages. Even then, a penalty will probably only come into play based on the intent--are the duplicate content pages the result of site or CMS issues, or a deliberate attempt to influence search results? The handling of duplicate content falls under what is called filtering.
Rather than penalizing sites or Web pages, the search engines filter out those pages that they consider to be duplications, placing them further back within the search results. If the pages you want to show up first don't or other sites rank higher for the same content, then the engines may be seeing it as duplicate content and filtering appropriately.
If you still think your site is actually being penalized because of this, you can try searching in the search engines for the page title, a section of content, or the URL of the page. If the title is too simple, short, or common, then try wrapping it in quotation marks for an exact search or use an intitle: search, e.g., intitle:"Duplicate Content" (note: no space after the colon). If your page has been indexed, but doesn't seem to appear for these searches, then you may be looking at more than just a filter. However, at this point, it would be good to review other issues that might have garnered a penalty, such as other negative practices, like hidden links.
So hopefully that helps lay the penalty vs. filter issue to rest. Next, I'll break this out into the different types of duplicate content and ways to deal with it.