Peek into the robots.txt files of the Top 25 SEO and IM Blogs

VN:F [1.9.10_1130]
Rating: 0.0/5 (0 votes cast)

In my last post about 50 simple ways to improve your website, #11 on the checklist was to create a robots.txt file.  I received several e-mails from readers asking what exactly should go inside of a robots.txt file.  Because the answer to this question is different for every blog and website, I decided the best way to answer it was by showing you all what’s inside the robots.txt files of the Top 25 SEO and Internet Marketing blogs.

(The Top 25 blogs were determined by the Top 100 SEO & Internet Marketing Blogs list from BlogStorm)

SEOBook Logo

# $Id: robots.txt,v 1.7.2.1 2007/03/23 18:57:07 drumm Exp $
#
# robots.txt
#
# This file is to prevent the crawling and indexing of certain parts
# of your site by web crawlers and spiders run by sites like Yahoo!
# and Google. By telling these "robots" where not to go on your site,
# you save bandwidth and server resources.
#
# This file will be ignored unless it is at the root of your host:
# Used: http://example.com/robots.txt
# Ignored: http://example.com/site/robots.txt
#
# For more information about the robots.txt standard, see:
# http://www.robotstxt.org/wc/robots.html
#
# For syntax checking, see:
# http://www.sxw.org.uk/computing/robots/check.html

User-agent: *
Crawl-delay: 10
# Directories
Disallow: /database/
Disallow: /includes/
Disallow: /misc/
Disallow: /modules/
Disallow: /sites/
Disallow: /themes/
Disallow: /scripts/
Disallow: /updates/
Disallow: /profiles/
# Files
Disallow: /xmlrpc.php
Disallow: /cron.php
Disallow: /update.php
Disallow: /install.php
Disallow: /INSTALL.txt
Disallow: /INSTALL.mysql.txt
Disallow: /INSTALL.pgsql.txt
Disallow: /CHANGELOG.txt
Disallow: /MAINTAINERS.txt
Disallow: /LICENSE.txt
Disallow: /UPGRADE.txt
# Paths (clean URLs)
Disallow: /admin/
Disallow: /aggregator/
Disallow: /comment/reply/
Disallow: /contact/
Disallow: /logout/
Disallow: /node/add/
Disallow: /search/
Disallow: /user/register/
Disallow: /user/password/
Disallow: /user/login/
# Paths (no clean URLs)
Disallow: /?q=admin/
Disallow: /?q=aggregator/
Disallow: /?q=comment/reply/
Disallow: /?q=contact/
Disallow: /?q=logout/
Disallow: /?q=node/add/
Disallow: /?q=search/
Disallow: /?q=user/password/
Disallow: /?q=user/register/
Disallow: /?q=user/login/

SEOmoz Logo

User-agent: *
Disallow: /blogdetail.php?ID=537
Disallow: /blog?page
Disallow: /tracker
Disallow: /ugc?page
Disallow: /ugc/author/
Disallow: /ugc/category/

Problogger Logo

robots.txt file is empty

John Chow Logo

sitemap: http://www.johnchow.com/sitemap.xml

User-agent: *
Disallow: /cgi-bin/
Disallow: /go/
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /author/
Disallow: /page/
Disallow: /category/
Disallow: /wp-images/
Disallow: /images/
Disallow: /backup/
Disallow: /banners/
Disallow: /archives/
Disallow: /trackback/
Disallow: /feed/

User-agent: Googlebot-Image
Allow: /wp-content/uploads/

User-agent: Mediapartners-Google
Allow: /

User-agent: duggmirror
Disallow: /

Search Engine Land Logo

User-Agent: *
Disallow: /ads/
Disallow: /ads/www/
Disallow: /ads/www/delivery/
Disallow: /drafts/
Disallow: /beta/
Disallow: /cgi-bin/

Shoemoney Logo

User-agent: Googlebot

Disallow: /trackback/
Disallow: /wp-admin/
Disallow: /feed/
Disallow: /archives/
Disallow: /index.php
Disallow: /*.js$
Disallow: /*.inc$
Disallow: /*.css$
Disallow: */feed/
Disallow: */trackback/
Disallow: /page/
Disallow: /tag/
Disallow: /category/
Disallow: /store/
Disallow: /*?
User-agent: Mediapartners-Google*
Disallow:

User-agent: ia_archiver
Disallow: /

User-agent: duggmirror
Disallow: /

User-Agent: Googlebot
Disallow: /link.php
Disallow: /gallery2
Disallow: /gallery2/
Disallow: /category/
Disallow: /page/
Disallow: /pages/
Disallow: /feed/
Disallow: /feed

Search Engine Watch Logo

no robots.txt file

Matt Cutts Logo

User-agent: *
Disallow:

Search Engine Roundtable Logo

no robots.txt file

Compete Logo

# BEGIN XML-SITEMAP-PLUGIN
Sitemap: http://blog.compete.com/sitemap.xml
# END XML-SITEMAP-PLUGIN

Sitepoint Logo

User-agent: EmailSiphon
Disallow: /

User-agent: EmailWolf
Disallow: /

User-agent: autoemailspider
Disallow: /

User-agent: ExtractorPro
Disallow: /

User-agent: URLSpiderPro
Disallow: /

User-agent: Crescent
Disallow: /

User-agent: CherryPicker
Disallow: /

User-agent: WebBandit
Disallow: /

User-agent: WebEMailExtrac
Disallow: /

User-agent: WebCopier
Disallow: /

User-agent: NICErsPRO
Disallow: /

User-agent: Openfind
Disallow: /

User-agent: grub-client
Disallow: /

User-agent: WebWhacker
Disallow: /

User-agent: gigabaz
Disallow: /

User-agent: PingALink
Disallow: /

User-agent: LexiBot
Disallow: /

User-agent: LinkSleuth
Disallow: /

User-agent: OfflineExplorer
Disallow: /

User-agent: Teleport
Disallow: /

User-agent: Zeus.Webster
Disallow: /

User-agent: Microsoft.URL
Disallow: /

User-agent: Wget
Disallow: /

User-agent: WebCapture
Disallow: /

User-agent: Sweeper
Disallow: /

User-agent: Aide
Disallow: /

User-agent: larbin
Disallow: /

User-agent: Szukacz
Disallow: /

User-agent: httpdown
Disallow: /

User-agent: MSIECrawler
Disallow: /

User-agent: LinkWalker
Disallow: /

User-agent: sitecheck.internetseer.com
Disallow: /

User-agent: Seeker
Disallow: /

User-agent: ASPSeek
Disallow: /

User-agent: DIIbot
Disallow: /

User-agent: IndyLibrary
Disallow: /

User-agent: psbot
Disallow: /

User-agent: almaden
Disallow: /

User-agent: MSProxy
Disallow: /

User-agent: SlySearch
Disallow: /

User-agent: EasyDL
Disallow: /

User-agent: WebZip
Disallow: /

User-agent: b2w
Disallow: /

User-agent: HTTrack
Disallow: /

User-agent: InternetSeer
Disallow: /

User-agent: User-Agent
Disallow: /

User-agent: EmailCollector
Disallow: /

User-agent: Python
Disallow: /

User-agent: Offline Explorer
Disallow: /

User-agent: LWP
Disallow: /

User-agent: Simple
Disallow: /

User-agent: sohu
Disallow: /

User-agent: Fetch
Disallow: /

User-agent: ichiro
Disallow: /

User-agent: production
Disallow: /

User-agent: libwww
Disallow: /

User-agent: Zehuti
Disallow: /

User-agent: robot
Disallow: /

User-agent: httrack
Disallow: /

User-agent: Simpy
Disallow: /

User-agent: kinjabot
Disallow: /

User-agent: livedoorCheckers
Disallow: /

User-agent: Lite Bot
Disallow: /

User-agent: MFC
Disallow: /

User-agent: UltraWombat
Disallow: /

User-agent: Hatena
Disallow: /

User-agent: WebStripper
Disallow: /

User-agent: grub
Disallow: /

User-agent: php
Disallow: /

User-agent: naver
Disallow: /

User-agent: loopimprovements
Disallow: /

User-agent: zao
Disallow: /

User-agent: links
Disallow: /

User-agent: Downloader
Disallow: /

User-agent: Cache Content
Disallow: /

User-agent: IAArchiver
Disallow: /

User-agent: UrlDispatcher
Disallow: /

User-agent: Exabot
Disallow: /

User-agent: Java
Disallow: /

User-agent: deepindex.com
Disallow: /

User-agent: WebZIP
Disallow: /

User-agent: UltraUptime
Disallow: /

User-agent: avantgo
Disallow: /

User-agent: TMCrawler
Disallow: /

User-agent: QihooBot
Disallow: /

User-agent: Indy Library
Disallow: /

User-agent: *
Disallow: /cgi-bin/
Disallow: /images/
Disallow: /forums/report.php
Disallow: /forums/search.php
Disallow: /forums/newreply.php
Disallow: /forums/editpost.php
Disallow: /forums/memberlist.php
Disallow: /forums/profile.php
Disallow: /launch/
Disallow: /search/
Disallow: /voucher/424/
Disallow: /email/
Disallow: /feedback/
Disallow: /contact?reason=articlesuggest
Disallow: /linktothis/
Disallow: /popup/
Disallow: /forums/archive/

Performancing Logo

# $Id: robots.txt,v 1.7.2.1 2007/03/23 18:57:07 drumm Exp $
#
# robots.txt
#
# This file is to prevent the crawling and indexing of certain parts
# of your site by web crawlers and spiders run by sites like Yahoo!
# and Google. By telling these "robots" where not to go on your site,
# you save bandwidth and server resources.
#
# This file will be ignored unless it is at the root of your host:
# Used: http://example.com/robots.txt
# Ignored: http://example.com/site/robots.txt
#
# For more information about the robots.txt standard, see:
# http://www.robotstxt.org/wc/robots.html
#
# For syntax checking, see:
# http://www.sxw.org.uk/computing/robots/check.html

User-agent: *
Crawl-delay: 10
# Directories
Disallow: /database/
Disallow: /includes/
Disallow: /misc/
Disallow: /modules/
Disallow: /sites/
Disallow: /themes/
Disallow: /scripts/
Disallow: /updates/
Disallow: /profiles/
# Files
Disallow: /xmlrpc.php
Disallow: /cron.php
Disallow: /update.php
Disallow: /install.php
Disallow: /INSTALL.txt
Disallow: /INSTALL.mysql.txt
Disallow: /INSTALL.pgsql.txt
Disallow: /CHANGELOG.txt
Disallow: /MAINTAINERS.txt
Disallow: /LICENSE.txt
Disallow: /UPGRADE.txt
# Paths (clean URLs)
Disallow: /admin/
Disallow: /aggregator/
Disallow: /comment/reply/
Disallow: /contact/
Disallow: /logout/
Disallow: /node/add/
Disallow: /search/
Disallow: /user/

# Paths (no clean URLs)
Disallow: /?q=admin/
Disallow: /?q=aggregator/
Disallow: /?q=comment/reply/
Disallow: /?q=contact/
Disallow: /?q=logout/
Disallow: /?q=node/add/
Disallow: /?q=search/
Disallow: /?q=user/password/
Disallow: /?q=user/register/
Disallow: /?q=user/login/

Dosh Dosh Logo

User-agent: *
Disallow: /wp-content/
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-
Disallow: /feed/
Disallow: /trackback/
Disallow: /cgi-bin/
Sitemap: http://www.doshdosh.com/sitemap.xml

Copyblogger Logo

User-agent: *
Disallow: /*/feed/
Disallow: /*/trackback/

Search Engine Journal Logo

# BEGIN XML-SITEMAP-PLUGIN
Sitemap: http://www.searchenginejournal.com/sitemap.xml.gz
# END XML-SITEMAP-PLUGIN

Marketing Pilgrim Logo

User-agent: *
Disallow:

Daily Blog Tips Logo

User-agent: *
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /feed/
Disallow: /trackback/
Disallow: /cgi-bin/

Lorelle on WordPress Logo

Sitemap: http://lorelle.wordpress.com/sitemap.xml

User-agent: IRLbot
Crawl-delay: 3600

User-agent: *
Disallow: /next/

User-agent: *
Disallow:

SEO Blackhat Logo

User-agent: *
Disallow: /private/
Disallow: /forum/calendar.php

User-agent: http://www.almaden.ibm.com/cs/crawler
Disallow: /

User-agent: NPBot
Disallow: /

User-agent: TurnitinBot
Disallow: /

User-agent: EmailCollector
Disallow: /

User-agent: EmailWolf
Disallow: /

User-agent: CopyRightCheck
Disallow: /

User-agent: Black Hole
Disallow: /

User-agent: Titan
Disallow: /

User-agent: NetMechanic
Disallow: /

User-agent: CherryPicker
Disallow: /

User-agent: EmailSiphon
Disallow: /

User-agent: WebBandit
Disallow: /

User-agent: Crescent
Disallow: /

User-agent: NICErsPRO
Disallow: /

User-agent: SiteSnagger
Disallow: /

User-agent: ProWebWalker
Disallow: /

User-agent: CheeseBot
Disallow: /

User-agent: ia_archiver
Disallow: /

User-agent: ia_archiver/1.6
Disallow: /

User-agent: Teleport
Disallow: /

User-agent: TeleportPro
Disallow: /

User-agent: Wget
Disallow: /

User-agent: MIIxpc
Disallow: /

User-agent: Telesoft
Disallow: /

User-agent: Website Quester
Disallow: /

User-agent: WebZip
Disallow: /

User-agent: moget/2.1
Disallow: /

User-agent: WebZip/4.0
Disallow: /

User-agent: Mister PiX
Disallow: /

User-agent: WebStripper
Disallow: /

User-agent: WebSauger
Disallow: /

User-agent: WebCopier
Disallow: /

User-agent: NetAnts
Disallow: /

User-agent: WebAuto
Disallow: /

User-agent: TheNomad
Disallow: /

User-agent: WWW-Collector-E
Disallow: /

User-agent: RMA
Disallow: /

User-agent: libWeb/clsHTTP
Disallow: /

User-agent: asterias
Disallow: /

User-agent: httplib
Disallow: /

User-agent: turingos
Disallow: /

User-agent: spanner
Disallow: /

User-agent: InfoNaviRobot
Disallow: /

User-agent: Harvest/1.5
Disallow: /

User-agent: Bullseye/1.0
Disallow: /

User-agent: Mozilla/4.0 (compatible; BullsEye; Windows 95)
Disallow: /

User-agent: Crescent Internet ToolPak HTTP OLE Control v.1.0
Disallow: /

User-agent: CherryPickerSE/1.0
Disallow: /

User-agent: CherryPickerElite/1.0
Disallow: /

User-agent: WebBandit/3.50
Disallow: /

User-agent: DittoSpyder
Disallow: /

User-agent: SpankBot
Disallow: /

User-agent: BotALot
Disallow: /

User-agent: lwp-trivial/1.34
Disallow: /

User-agent: lwp-trivial
Disallow: /

User-agent: Wget/1.6
Disallow: /

User-agent: BunnySlippers
Disallow: /

User-agent: URLy Warning
Disallow: /

User-agent: Wget/1.5.3
Disallow: /

User-agent: LinkWalker
Disallow: /

User-agent: cosmos
Disallow: /

User-agent: moget
Disallow: /

User-agent: hloader
Disallow: /

User-agent: humanlinks
Disallow: /

User-agent: LinkextractorPro
Disallow: /

User-agent: Offline Explorer
Disallow: /

User-agent: Mata Hari
Disallow: /

User-agent: LexiBot
Disallow: /

User-agent: Web Image Collector
Disallow: /

User-agent: The Intraformant
Disallow: /

User-agent: True_Robot/1.0
Disallow: /

User-agent: True_Robot
Disallow: /

User-agent: BlowFish/1.0
Disallow: /

User-agent: JennyBot
Disallow: /

User-agent: MIIxpc/4.2
Disallow: /

User-agent: BuiltBotTough
Disallow: /

User-agent: ProPowerBot/2.14
Disallow: /

User-agent: BackDoorBot/1.0
Disallow: /

User-agent: toCrawl/UrlDispatcher
Disallow: /

User-agent: WebEnhancer
Disallow: /

User-agent: TightTwatBot
Disallow: /

User-agent: suzuran
Disallow: /

User-agent: VCI WebViewer VCI WebViewer Win32
Disallow: /

User-agent: VCI
Disallow: /

User-agent: Szukacz/1.4
Disallow: /

User-agent: QueryN Metasearch
Disallow: /

User-agent: Openfind data gathere
Disallow: /

User-agent: Openfind
Disallow: /

User-agent: Xenu’s Link Sleuth 1.1c
Disallow: /

User-agent: Xenu’s
Disallow: /

User-agent: Zeus
Disallow: /

User-agent: RepoMonkey Bait & Tackle/v1.01
Disallow: /

User-agent: RepoMonkey
Disallow: /

User-agent: Zeus 32297 Webster Pro V2.9 Win32
Disallow: /

User-agent: Webster Pro
Disallow: /

User-agent: EroCrawler
Disallow: /

User-agent: LinkScan/8.1a Unix
Disallow: /

User-agent: Kenjin Spider
Disallow: /

User-agent: Keyword Density/0.9
Disallow: /

User-agent: Cegbfeieh
Disallow: /

Google Operating System Logo

User-agent: Mediapartners-Google
Disallow:

User-agent: *
Disallow: /search
Noindex: /feedReaderJson

Sitemap: http://googlesystem.blogspot.com/feeds/posts/default?orderby=updated

V7N Logo

User-agent: *
Disallow:

Marketing Vox Logo

User-agent: msnbot
Disallow: /categories
Crawl-delay: 120

User-agent: Slurp
Crawl-delay: 60

User-agent: *
Disallow: /ad_inventory
Disallow: /ads
Disallow: /backup
Disallow: /dev
Disallow: /discuss
Disallow: /images
Disallow: /inventory
Disallow: /mt
Disallow: /private
Disallow: /refer
Disallow: /templates
Disallow: /templates_c
Disallow: /test
Disallow: /utilities
Disallow: /new
Disallow: /wp

Hitwise Logo

robots.txt file is empty

e-Consultancy Logo

# robots.txt for http://www.e-consultancy.com
#
# e-consultancy #
###########################################
# last modified 15 February 2001 01:53:57 #
# author: matthew o'riordan #
###########################################

User-agent: *
Disallow: /js/
Disallow: /includes/
Disallow: /styles/
Disallow: /templates/
Disallow: /xml/

Graywolf Logo

User-agent: *
Disallow: /cgi-bin/
Disallow: /wp-admin/
Disallow: /wp-content/plugins/
Disallow: /images/
Disallow: /noindex/
Disallow: /privacy-policy/
Disallow: /about/
Disallow: /company-biographies/
Disallow: /press-media-room/
Disallow: /newsletter/
Disallow: /contact-us/
Disallow: /terms-of-service/
Disallow: /terms-of-service/
Disallow: /information/comment-policy/
Disallow: /faq/
Disallow: /contact-form/
Disallow: /advertising/
Disallow: /information/licensing-information/
Disallow: /2004/
Disallow: /2005/
Disallow: /2006/
Disallow: /2007/
Disallow: /2008/
Disallow: /2009/
Disallow: /2010/
Disallow: /2011/
Disallow: /2012/
Disallow: /2013/
Disallow: /2014/
Disallow: /2015/
Disallow: /2016/
Disallow: /2017/
Disallow: /2018/
Disallow: /2019/
Disallow: /*?*
Disallow: /page/
Disallow: /iframes/

# BEGIN XML-SITEMAP-PLUGIN
Sitemap: http://www.wolf-howl.com/sitemap.xml.gz
# END XML-SITEMAP-PLUGIN

VN:F [1.9.10_1130]
Rating: 0.0/5 (0 votes cast)
Be Sociable, Share!