How to improve the efficiency of Web crawler

Many workers have experienced reptiles crawl very slowly, and now most of the sites are equipped with anti-crawler technology, IP access restrictions on the frequency of very strict. If you want faster reptiles, try the following methods.
How to improve the efficiency of Web crawler?
1. reptiles crawl frequency will be improved, we can break some sites to verify information, to verify the site is generally taken or need to log user authentication code.
2. Let reptiles use multiple threads, the computer must have sufficient memory. Also use a proxy IP, IP agents find the kind of stable online, enhance the efficiency of this method is a good choice.
When reptile data, you can select different methods to improve the efficiency according to their needs. But proxy IP can be said that the necessary software, flash cloud proxy massive IP online, covering a wide range of urban, simple IP switching, stable online, is a good helper reptile work.

Guess you like

Origin blog.51cto.com/14338698/2400674