[Crawler] The classification of usage scenarios is as follows:
1. General crawler An
important part of the crawling system, which crawls a whole page of data
2. Focused crawlers are
built on the basis of general crawlers to grab specific partial content on the page
3. Incremental crawler
detects the data update situation in the website and only crawls the latest updated data in the website
http protocol
- Concept : a form of data interaction between server and client
- Common request header information
User-Agent: the identity of the request carrier
Connection: whether to disconnect or keep connected after the request is completed - Common response header information
Content-Type: The type of data the server responds back to the client
https protocol:
- Secure Hypertext Transfer Protocol
- Encryption method (general understanding)
Symmetric key encryption
Asymmetric key encryption
However, there is still a risk of hijacking the public key; and low efficiency
Certificate key encryption (https)