--- python reptile implement item (iv) data analysis Sina news BeautifulSoup

  This only demonstrates how to use in a real project BeautifulSoup library to parse pages, Sina news is ajax loaded over the data, here we only demonstrate resolve part of the data (specific pocketing mechanism did not do analysis).

Code Address: https://gitee.com/dwyui/BeautifulSoup_xinlang.git .

The blog about the reptile has been increasingly using the technology more and more, later I will continue to write down, probably from several angles to write multi-threaded crawling (to improve efficiency), how to better do crawling data (crack pocketing).

Redis management with multi-threading and proxy IP, the latter will do for a blog about non-relational databases, so stay tuned.

Guess you like

Origin www.cnblogs.com/cxiaocai/p/10963021.html