Understanding duplicate content: Outside view
Duplicate content from external sites may seem like an uncontrollable issue, but there are steps that can be taken to reduce the concerns.
Are you being outranked by you? Is "your" content showing up in searches, but on sites that aren't yours? Do you have multiple websites that compete against each other? Well this discussion on duplicate content from external sources should be right up your alley.
Earlier in the week, I started our discussion on duplicate content by trying to lay to rest the idea of a. Now we pick up that discussion with one aspect of duplicate content . . . content duplication from other sites.
While I'd love to start out our discussion with the idea that external duplicate content is the hardest to deal with, that may not always be the case as you'll see when we talk about duplication on our own websites. For now though, we are just going to focus on content duplication from other sites.
At this point, you are probably in one of two camps--the "Yes, help me with this please," camp or the "What in the world are you talking about?" camp. So let's start by getting everyone in the same camp at least. External content duplication can come about, generally, in three ways.
In every aspect of life, there are those who want to get ahead through the hard work of others, even illegally or unethically. The Web is certainly no exception to this, especially given the fact that, of all the ways to take advantage of the hard efforts of others, copy-paste must certainly be the laziest--I mean easiest.
Don't feel that this is an issue that only affects big name brands and sites, because anyone who publishes online is susceptible to this kind of attack. Keep in mind that what we are talking about here is essentially copyright infringement, not phishing sites and things like that, which is a whole other level of criminal activity.
Realistically, this is probably the hardest to combat, but in many cases, probably doesn't cause as much damage as you might think. In many ways, we might thank the search engines for this. They're out to deliver the best results they can to searchers and are certainly aware of these issues. Because of this, I truly believe they work really hard to identify authoritative and original sources of content. They can compare content they find based on when they found it, as well as links leading back to that content, and while purely speculation, I would have to imagine that it would be pretty easy for the engines to assign a score to any site based on the proportion of content on the site that appears elsewhere and determine natural and unnatural patterns.
So what can you do about content theft? While you can file reports with the search engines based on the Digital Millennium Copyright Act (just search on "Google copyright infringement" or the respective search engine for specific details), the ISP that hosts the infringing domain, or seek even greater legal action, it may be better to first weigh the impact you feel it really has as well as the resources it may take to fight it and determine whether it is worth your attention to begin with. And sometimes, just an email or letter to the infringer might be enough
Ironically, you are probably the most responsible for your own duplicate content on other sites. Writing content and syndicating through article directories or other content syndication services, RSS feeds of blog posts, and press release syndication will probably make up far more of your duplication woes than pirated content.
Each of these instances can be addressed though. Article writing and similar content is best kept unique and different from any content you have on your own site. When it comes to this kind of content, it is often best to develop content for the sites where it is going to be placed anyway, rather than a mass distribution. Of course, you'll also want to include a byline with a link back to your site.
Blog syndication can be handled a little differently. You may decide to include only a summary of your post, or the full post. The pros and cons here must be weighed, since a partial feed may discourage some sites from even syndicating your blog. In many cases, there may be enough differentiation between your blog and the sites where your post is syndicated anyway. However the best solution is to also include an absolute link back to the blog post on your own site. This helps signal to the search engines that your post is the source.
Press releases can be handled the same way as these other content pieces. Whether you are distributing through wire services or using RSS to syndicate from your site, including links back to your site helps signal the source. Press releases also tend to be more temporary on external sites, though you should certainly keep an archive on your own site.
The final source of external content also falls under your control. Micro-site strategy consists of creating additional websites, often around niche topical areas. This strategy evolved out of the idea that if one website was good, then many websites must be better, and would increase the chances of ranking in search engines and the number of listings for a particular search. Some view micro-sites as a good thing, while others view them as bad, however neither view is particularly accurate. Rather, it is the implementation that makes them good or bad.
Micro-site strategy is a much bigger topic, but bad implementation is directly related to our discussion of duplicate content. Most micro-site implementations result in identical or nearly identical duplication of the main website's pages on the various micro-sites. This isn't surprising since creating unique content for one site, especially for an ecommerce site, is often challenging enough without having to create unique content for multiple sites. But rather than improving or increasing rankings, the micro-sites tend to directly compete with the main site and greater resources are needed to maintain multiple sites. Needless to say, this is why most micro-site implementations are bad.
Like many things, there are a few tools that can be used in the fight against duplicate content. One tool to help you keep on top of potential content theft issues is Copyscape, that allows you to enter in your page and it comes back with a list of potential duplication.