The provisions of reptiles
Robots agreement
Web developers for web crawler announcement specification, you can not follow 可能存在法律风险
, but try to comply
Robots protocol: the root page + / robots.txt as www.baidu.com/robots.txt
The basic syntax Robots agreement:
# * On behalf of all, / represents the root directory of the User-Agent: * # the User-Agent on behalf source the Allow: / # represents the run crawling content Disallow: / # represents the directory is not crawling, if it is / is not followed by written content, they corresponding visitor is not all content crawl
并不是所有网站都有Robots协议
If a site does not provide Robots agreement, this site is a no limit corresponds to all reptiles
You can not refer to robots protocol, such as we write small programs to access small amounts, less content但是内容不能用于商业用途
Overall seek permission to keep Robots agreement