Python crawler + Baidu API

Web crawler: also known as web spider, web robot, is a program or script that automatically crawls online information according to certain rules.


image

What's the effect : This technology can quickly and efficiently obtain the big data you want on the Internet, which is the basis of big data analysis.
Basic knowledge : If you want to use crawler technology to crawl the data you want on the Internet, you need to have a general understanding of W3C standards (HTML, JSON, XPath, etc.) and HTTP protocol standards (HTTP request process, request method, Cookie status management, etc.) .



Baidu Map APIimage

Baidu Map API: Everyone has used Baidu Map, but what is this API? Baidu Map API is to provide developers with http/https interfaces, that is, developers initiate retrieval requests in the form of http/https, and obtain retrieval data in json or xml format.


image

Create an API key : http://lbsyun.baidu.com/apiconsole/key. Open the link, log in with the applied Baidu account, click create application, enter * in the Referer whitelist , and you can see the API key in the view application after the creation is complete

image


Location retrieval service : Provides POI retrieval functions in multiple scenarios, including city retrieval, circular area retrieval, and rectangular area retrieval. Developers can obtain basic or detailed geographic information of a location (POI) through the interface. This function is divided into administrative division area search, circular area search, rectangular area search .


Administrative division area search: Developers can use this function to search for location information within an administrative division (currently down to the city level).

http://api.map.baidu.com/place/v2/search?query=Bank®ion=Beijing&output=json&ak=Applying key


Circular area search: Developers can set the center and radius to search for location information in the circular area (usually used in surrounding search scenes).


http://api.map.baidu.com/place/v2/search?query=Bank&location=39.915,116.404&radius=2000&output=xml&ak=application key


imageRectangular area search: Developers can set the coordinates of the lower left and upper right corners of the search area, and the search coordinates correspond to the location information in the rectangle


http://api.map.baidu.com/place/v2/search?query=Bank&location=39.915,116.404&radius=2000&output=xml&ak=application key


image

Example: To retrieve the scenic area information of Yiyang City through the administrative area, enter the following information in the browser

http://api.map.baidu.com/place/v2/search?query=Scenic area®ion=Yiyang&output=json&page_size=5&ak=Applying key

image

imageThe returned information is in JSON format (can be modified to XML).

imageA single access service can return up to 400 pieces of data at the same time. , This limit cannot be modified.

imageEach page returns a maximum of 20 query information, more than 20 can be resolved through the page_num parameter.



Python crawler + Baidu API



Python-based crawler technology, combined with Baidu Map API, obtains all scene information (communities, scenic spots, schools, commercial plazas, etc.) in the whole city of Yiyang. Two modules, requests and json are mainly used.


image

image

image

image It is very practical to have wood. In fact, the above only explains the function of a location search, and there are more mysterious ones, such as constant traffic query, IP positioning, etc.

image


Guess you like

Origin blog.51cto.com/15069490/2578647