Day1: Notes


A. Reptile basic principle
1. What is a reptile?
Reptile is crawling data

2. What is the Internet?
By a bunch of network equipment, to Taiwan and Taiwan's computer to the Internet together called the Internet

3. The purpose of the establishment of the Internet?
Data transfer and sharing of data

4. What is the data?
For example: Product Information electricity supplier platform (Taobao, Jingdong, Amazon), the chain of home,
comfortable rental platform (listings), equity securities investment information (Eastern wealth, snowball network), 12306 (ticket information),

5. What is Internet?
Ordinary users:
open a browser ----> enter the URL
----> host sends a request to a target
----> returns the response data
----> the rendering data to the browser in

the crawler:
Analog Browser - ---> host sends a request to a target
-----> returns response data
-----> parse and extract valuable data
-----> saved data (files written to the local, persistent database)

6 the whole process reptiles
① sends a request (request library: requests / the Selenium)
② fetch response data
③ analysis data (learning parsing library: BeautifulSoup4)
④ save the data (repository: file save / MongoDB)

summary: we can put in the Internet data likened to a treasure, reptile is actually digging treasure

Two .requests request library
1. Installation and use of
PIP3 the install Requests

2. Analog browser (request analysis process)
Baidu:
① URL request
② request method (the GET, the POST)
③ response status code

Guess you like

Origin www.cnblogs.com/Auraro997/p/11119971.html