Reptiles Robots agreement

The provisions of reptiles

Robots agreement

Web developers for web crawler announcement specification, you can not follow 可能存在法律风险, but try to comply

Robots protocol: the root page + / robots.txt as www.baidu.com/robots.txt

The basic syntax Robots agreement:

# * On behalf of all, / represents the root directory of 
the User-Agent: * # the User-Agent on behalf source 
the Allow: / # represents the run crawling content 
Disallow: / # represents the directory is not crawling, if it is / is not followed by written content, they corresponding visitor is not all content crawl

并不是所有网站都有Robots协议

If a site does not provide Robots agreement, this site is a no limit corresponds to all reptiles

You can not refer to robots protocol, such as we write small programs to access small amounts, less content但是内容不能用于商业用途

Overall seek permission to keep Robots agreement

 

Guess you like

Origin www.cnblogs.com/baohanblog/p/12664184.html