python web crawler information

1. crawling Jingdong information
Here Insert Picture Description
2. crawled pages of information on
many sites there are restrictions on crawling, relatively invisible, view network head, is not a reptile request is denied.
Here Insert Picture Description
View header information, you can visit to see the head , may be declined
Here Insert Picture Description
so we built key-value pairs, the change in header information on the url..
kV = { 'User-Agent': 'the Mozilla / 5.0'}
Here Insert Picture Description

3. Baidu submit / 360 keyword search
Baidu keyword word Interface:
http://www.baidu.com/s?wd=keyword
360 interfaces Keywords:
http://www.so.com/s?q= keyword
so we can construct url can be extracted for keyword
Here Insert Picture Description

Published 75 original articles · won praise 4 · Views 5037

Guess you like

Origin blog.csdn.net/ysy_1_2/article/details/104973187