First, define a class, and then define a logical approach run () writes the following ideas in order, and then completed by a small step for each method, run () method of each step which again call the corresponding methods.
1.url
- We know the law and the url address gotta Number of pages: List structure url address
- start_url, url to access the beginning, and then follow some other law iterate
2. The transmission request acquisition response
- requests.get()
- response.content.decode()
3. Extract data
- It returns json string: json module
- Html returns the string: lxml module with the extracted data xpath
4. Save
with open("文件名","a",encoding="utf-8") as f: f.write()