Baidu Search: 10 questions about the Baiduspider

Cat Ning! ! !

Reference links:

http://help.baidu.com/question?prod_id=99&class=476&id=2996

https://ziyuan.baidu.com/college/articleinfo?id=1002

 

This is the master Baidu robots.txt

https://www.baidu.com/robots.txt

An example of which follows:

User-agent: Googlebot

Disallow: /baidu

Disallow: /s?

Disallow: / Shepher /

Disallow: /homepage/

Disallow: /cpro

Disallow: /ulink?

Disallow: /link?

Disallow: /home/news/data/

 

1- What is Baiduspider ?

Baiduspider is an automated program Baidu search engine, its role is to access web pages on the Internet, indexing database, allowing users to search for pages on your site in the Baidu search engine.

 

What 2-Baiduspider the user-agent is?

Search engine Baidu for each product uses different user-agent:

Web Search: Baiduspider

Mobile Search: Baiduspider

Image Search: Baiduspider-image

Video Search: Baiduspider-video

News Search: Baiduspider-news

Baidu Collection: Baiduspider-favo

Baidu Union: Baiduspider-cpro

Business Search: Baiduspider-ads

 

3- How to distinguish user-agent Baidu PC and mobile web search?

Baidu PC Search Full UA: Mozilla / 5.0 (compatible; Baiduspider / 2.0; + http://www.baidu.com/search/spider.html )

Baidu mobile search Full UA: Mozilla / 5.0 (Linux; u; Android 4.2.2; zh-cn;) AppleWebKit / 534.46 (KHTML, like Gecko) Version / 5.1 Mobile Safari / 10600.6.3 (compatible; Baiduspider / 2.0; + http://www.baidu.com/search/spider.html )

Determined by the keyword Baiduspider / 2.0 is a PC UA.

By keywords Android, Mobile and Baiduspider / 2.0 for the MS determines Baidu reptiles.

 

4-Baiduspider how to create a web server to access the pressure?

In order to achieve the target resources better retrieval results, Baiduspider need to maintain a certain amount of crawling your site. We try not to bring an unreasonable burden to the site, and will be adjusted based on server capacity, site quality, site updates and other comprehensive factors. If you feel there is any access behavior baiduspider unreasonable, you can feedback to Baidu Feedback Center.

 

5- Why Baiduspider constantly crawl my site?  

For continuously updated or page on your site newly created, Baiduspider will continue to crawl. In addition, you can also check the website access log Baiduspider access is normal, in order to prevent malicious posing Baiduspider to crawl your site frequently. If you find Baiduspider abnormal crawl your site, please complaints platform to give us feedback, and please try to give Baiduspider access log to Guizhan order tracking process to us.

 

6- How to determine whether posing Baiduspider crawl?   

We recommend that you use the DNS Lookup way to determine the source of ip crawl whether Baidu, different authentication methods depending on the platform.

under linux host ip command, nslookup ip command windows, ios next dig ip command

Check whether the format hostname * .baidu.com or * .baidu.jp, and that is not in line with the fake.

 

7- I do not want my site to be accessed Baiduspider, how can I do?

Baiduspider abide by Internet robots protocol. You can use the robots.txt file completely banned Baiduspider visit your site, or prohibit Baiduspider access to some files on your site. Note: Do not Baiduspider visit your site, will make the pages on your site, as well as in the Baidu search engine Baidu to provide all the services of a search engine search engines can not be searched. Writing about robots.txt method, please refer to our introduction: r obots.txt writing methods

You can, depending on each product set different rules to crawl user-agent, if you want a total ban on all products included Baidu, can be set directly on the Baiduspider prohibited crawl.

The following robots achieve a ban on all crawling from Baidu's:

User-agent: Baiduspider Disallow: /

The following robots achieve a ban on all but crawling from Baidu Image Search allows to crawl / image / directory:

User-agent: Baiduspider Disallow: /

User-agent: Baiduspider-image Allow: /image/

Note: Baiduspider-cpro not crawl pages built into the index, just perform the operation agreed with the customer, it does not comply with robots protocol, if Baiduspider-cpro caused distress to you, please contact [email protected]. Baiduspider-ads do not crawl pages built into the index, just perform the operation agreed with the customer, it does not comply with robots protocol, if Baiduspider-ads caused distress to you, please contact your customer service specialists.

 

8- Why my site has added a robots.txt, but also to search out in Baidu?   

Because the search engines index update database takes time. Although Baiduspider has stopped accessing pages on your site, but the page index information Baidu search engine database has been established, it may take several months until cleared. Please also check your robots configured correctly. If you refuse to be included demand is very urgent, you can also complain platform feedback request processing.


9- I hope that my website content is indexed but not by Baidu snapshot saved, how can I do?
   

Baiduspider abide by Internet meta robots protocol. You can use the page meta settings so Baidu show only a snapshot of the page but does not appear in the search results pages indexed.
And update the robots of the same, because the update search engines to index the database takes time, so although you have a meta banned snapshot of the page Baidu displayed in the search results, but Baidu search engine database if the already established web pages indexed in the Web page information, you may need two to four weeks will be on-line effect.

10-Baiduspider grab the bandwidth caused by the blockage?

Baiduspider normal crawl your site and the bandwidth will not cause blockage, resulting in this phenomenon may be due to someone posing as Baiduspider malicious crawl. If you find the name of the agent Baiduspider crawl and causing blockage of bandwidth, please contact us as soon as possible. You can complain to the feedback platform, if your site provides access log this period will be more conducive to our analysis.

 

Guess you like

Origin www.cnblogs.com/landesk/p/10984380.html