Google published a post on the Webmaster Central blog today offering some tips on how to recognize and avoid duplicate content on your website. I wanted to offer some specific tips on how to recognize and remove duplicate content on your self-hosted WordPress blog and decided to create a video tutorial.
This video doesn’t cover all of the bases; however, it will help quite a bit towards the prevention as well as the removal of duplicate content from your blog. Keep in mind, there is a fear about duplicate content and often people say that having duplicate content on your blog is bad. I can’t say that it’s a good thing, but I don’t think it’s something that should be considered as being bad. The reason for that is simple. Google understands that many bloggers aren’t very computer savvy. Instead, they are people who simply enjoy writing about their day, or telling the world about their opinions and views, etc. In other words, most bloggers aren’t programmers, nor are they SEO experts.
That said, if duplicate content was really a BAD thing, then it’s very likely that Google would have display duplicate content in the results. Knowing this, I think Google does a really good job at automatically determining source content. And what I mean by that is this. With a WordPress blog, a copy of your post content is found in multiple sections Typically, a copy of a post can be found on the homepage, category page, tag page, and in yearly/monthly/weekly/daily archives; however, source url for a blog is the individual post page.
Overall, Google understands the linking structure of WordPress and other blog platforms and for the most part gets things right. However, there’s many situations when that doesn’t happen. And with that, there’s ways that you can totally control and prevent Google from indexing multiple copies of your blog posts. Take a look at this video to learn how you can prevent and remove duplicate content from getting indexed on your WordPress blog.
The first step is to recognize if your WordPress blog has duplicate content indexed in Google. You can do this by doing a site:query search as shown in these example image below:
The above screen shot illustrates this search: [site:garryconn.com "On SeoHosting.com I wrote an article offering"] or as shown below in this screen shot:
You’ll see that two listings appear in the above example. Google has a reference to my home page and the blog post page. Technically, this is duplicate content. However, this is what I call controllable duplicate content. In other words, even though Google has a record of the same content appearing on the home page as well as the post page, it’s a matter of time before the content will be bumped off the homepage due to the age of the page.
The problem that you may run into is when multiple sections of your blog are referenced to having the same article content such as the category section, archive section, and tag sections. Commonly people think that this is bad and that Google will crack a whip and ban your blog because of this, but really the only concern is that you’re allowing Google to make the decision on determining which version of your content is the source copy.
The easiest solution is to install two WordPress plugins. The first one is called XML-Sitemaps. The second plugin is called the All-In-One SEO Pack. In the video above, I quickly show you how to set these plugins up; however, you may have some questions regarding the configuration, so by all means ask questions if you have them. Drop a comment and I’ll do my best to help you out.