网站robots.txt文件说明

robots.txt是个很简单的文本文件，您只要标明“谁不能访问哪些链接”即可。
在文件的第一行写：
User-Agent: YodaoBot
这就告诉了爬虫下面的描述是针对名叫YodaoBot的爬虫。您还可以写：
User-Agent: *
这就意味着向所有的爬虫开放。需要注意的是一个robots.txt文件里只能有一个"User-Agent: *"。

接下来是不希望被访问的链接前缀。例如：
Disallow: /private
这就告诉爬虫不要抓取以"/private"开头的所有链接。包括/private.html，/private/some.html，/private/some/haha.html。如果您写成：
Disallow: /
则表明整个站点都不希望被访问。您也可以分多行来指定不希望被抓取的链接前缀，例如：
Disallow: /tmp
Disallow: /disallow
那么所有以"/tmp"和"/disallow"开头的链接都不会被访问了。

最后形成的robots.txt文件如下：
User-Agent: YodaoBot
Disallow: /tmp
Disallow: /private

下面是一个访问一个博客的robots.txt文件的内容：

User-agent: *
Allow:/
Disallow: /cgi-bin
Disallow: /wp-admin
Disallow: /wp-includes
Disallow: /wp-content
Disallow: /wp-content/plugins
Disallow: /wp-content/cache
Disallow: /wp-content/themes
Disallow: /trackback
Disallow: /feed
Disallow: /comments
Disallow: /category
Disallow: /author
Disallow: */trackback
Disallow: */feed
Disallow: */comments
Disallow: /*?*
Disallow: /*?
Disallow: /alipay
Disallow: /archives
Disallow: /bbs/read.php
Disallow: /bbs/forumcp.php
Disallow: /bbs/u.php
Disallow: /bbs/search.php
Disallow: /bbs/apps.php
Disallow: /bbs/admin.php
Disallow: /bbs/message.php
Disallow: /bbs/profile.php
Disallow: /bbs/login.php
Disallow: /bbs/new.php
Disallow: /bbs/job.php
Disallow: /bbs/simple
Disallow: /bbs/wap
Disallow: /bbs/admin

如果robots.txt文件配置不当，让好多网站失去被搜索引擎收录良机；在国外有一个的检查robots.txt文件合法性的在线验证工具，其地址是：http://tool.motoricerca.info/robots-checker.phtml。不过要注意，使用这个工具，请把robots.txt里的汉字注释清除，它识别不了汉字；如果有汉字在里面的话，它就不认为这是一个robots.txt文件了。

如果robots文件验证通过的话，一般会显示以下结果：
Analyzing file http://www.XXX.org/robots.txt
No errors found in this robots.txt file
Hide empty and comments lines:

The following block of code DISALLOWS the crawling of the following files and directories: /inc/ to all spiders/robots.
Line 1 # robots.txt for www.gz-kongtiao.cn
Line 2 User-agent: *
Line 3 Disallow: /inc/

另外，Google网站管理员工具也可以在线验证robots文件，google是支持中文注释的。要分析网站的 robots.txt 文件，请按以下步骤操作：
(1)使用 Google 帐户登录 Google 网站管理员工具。
(2)在"控制台"中，点击所需网站的网址。
(3)点击工具，然后点击分析 robots.txt。

网站robots.txt文件说明

猜你喜欢