Web crawler: also known as web spider, web robot, is a program or script that automatically crawls online information according to certain rules.
Basic knowledge : If you want to use crawler technology to crawl the data you want on the Internet, you need to have a general understanding of W3C standards (HTML, JSON, XPath, etc.) and HTTP protocol standards (HTTP request process, request method, Cookie status management, etc.) .
Baidu Map API
Baidu Map API: Everyone has used Baidu Map, but what is this API? Baidu Map API is to provide developers with http/https interfaces, that is, developers initiate retrieval requests in the form of http/https, and obtain retrieval data in json or xml format.
Administrative division area search: Developers can use this function to search for location information within an administrative division (currently down to the city level).
http://api.map.baidu.com/place/v2/search?query=Bank®ion=Beijing&output=json&ak=Applying key
Circular area search: Developers can set the center and radius to search for location information in the circular area (usually used in surrounding search scenes).
http://api.map.baidu.com/place/v2/search?query=Bank&location=39.915,116.404&radius=2000&output=xml&ak=application key
Rectangular area search: Developers can set the coordinates of the lower left and upper right corners of the search area, and the search coordinates correspond to the location information in the rectangle
http://api.map.baidu.com/place/v2/search?query=Bank&location=39.915,116.404&radius=2000&output=xml&ak=application key
Example: To retrieve the scenic area information of Yiyang City through the administrative area, enter the following information in the browser
http://api.map.baidu.com/place/v2/search?query=Scenic area®ion=Yiyang&output=json&page_size=5&ak=Applying key
The returned information is in JSON format (can be modified to XML).
A single access service can return up to 400 pieces of data at the same time. , This limit cannot be modified.
Each page returns a maximum of 20 query information, more than 20 can be resolved through the page_num parameter.
Python-based crawler technology, combined with Baidu Map API, obtains all scene information (communities, scenic spots, schools, commercial plazas, etc.) in the whole city of Yiyang. Two modules, requests and json are mainly used.