I have to consider that there are many people who don’t know what a robots.txt file is. In this article I am going to explain what a robots.txt file is, how to use it and why it is important. I am sure that there are many have heard about a robots.txt file is but might not know exactly what does it do and how will it benefit you? So let’s see if we can answer a few questions for you regarding the in’s and out’s of a robots.txt file.
The Simple Explanation
A robots.txt is a very small file that resides on your server that gives specific instructions to webcrawlers such as the famous Googlebot on which directories and files are allowed to be crawled and indexed. It is important to use a robots.txt file because it puts you (the publisher/webmaster) in total control of where webcrawlers are allow to visit on your website. A more detailed explanation can be found here on Wikipedia, at www.robotstxt.org, and here at Matt Cutt’s blog.
How Is It Useful?
Using a robots.txt file is useful (especially for WordPress users) because it gives you the ability to say where webcrawlers are allowed or NOT allowed to visit. By drawing this map for webcrawlers you do two things:
- You help the webcrawler by not allowing it to index crazy and stupid things such as files in your wp-includes directory.
- You help your important content get indexed the way you want it to and prevent duplicate content from getting indexed, such as disallowing your archives, category or tags section.
Every blog is unique and every author such as yourself, stress a unique importance on the various sections within a blog. It is up to you to decide which sections on your blog as well as within your server that you want to grant or deny access to webcrawlers. I have some blogs where I disallow all access except for the homepage and the individual post pages. Other blogs, I don’t care and put total faith into webcrawlers to crawl everything and index and rank things according to their importance.
How Do You Implement It?
If you are a WordPress blogger, your blog already has a robots.txt built in. By default, the files is set to allow access to all directories on your server. The process of overriding the default is easy. Simply create a txt file using NotePad, WordPad, TextPad, etc.. and name it robots.txt. From there, visit this page to learn about how you can quickly start adding instructions to it.
What Sections Will You Allow or Disallow?
Now that you know about the Robots.txt file, which section on your WordPress blog are you going to allow/disallow? If you already use a robots.txt file, I want to invite you to share you input by dropping a comment. I’d like to know what has been most effective for your blog. I look forward to reading your comments. Also, if you should have any questions, feel free to ask I’ll be standing by. Also, I am sure that many of my readers will be happy to pitch in and help address your questions too.