First, reptiles (Spider)
Request sites, web content extraction maximize the program. Html code is acquired, it is necessary to extract the required data from these texts.
HTTP: is the Internet's most widely used network protocol, a client and a server-side request and response standard (TCP), hypertext transfer protocol for transmission from the WWW server to the local browser, it can make browsing It is more efficient, so that network traffic is reduced.
HTTPS: HTTP is safe for the target channel, simply, is a safe version of HTTP, HTTP added SSL layer, HTTPS security infrastructure is SSL, encryption and therefore the details will need to SSL.
SSL (Secure Sockets Layer Secure Sockets Layer ) for network communications to provide security and data integrity of a secure protocol. SSL in the transport layer is encrypted network connection
Public platform interface is no longer supported http way calling, in December 2017 after 30 All sites must be called in HTTPS mode
URL (Uniform Resource Locator) Basic format:
scheme://host[:port#]/path/.../[?query-string][#anchor]
scheme : protocol. Such as: HTTP, HTTPS, the FTP
Host : IP address or domain name server. Such as: 192.168.0.11
Port # : server port. (Http default port 80, https default port is 443)
path : the path to access the resource
Query-String : parameters, data sent to the server http
Anchor : anchor (jump to a specific page the link address of the point spread)