Doraemon the difference between http and https in python reptile

1.http和https

  • http:

    • The concept: clinet Server and some form of data exchange

  • Common headers:

    • User-Agent: the identity of the requesting vehicle identification

    • Connection:close

    • content-type:

  • https:

    • The concept: secure http protocol

    • certificate

      • Symmetric encryption keys

        • It is encrypted using the public key locally, and then sends the data to the server private key

      • Asymmetric encryption secret key

        • After the end of the service provided by the public key to encrypt the client sent to the server

      • Certificate encryption keys

        • The server sends the public key to the certificate authority, and then to marking public key, sent to the client

Overview of related reptiles

  • Reptile concept:

    • Internet browser via analog programming, then allowed to climb to take the Internet / data capture process

      • Analog: The browser is a natural primitive reptile tool

  • Reptile Category:

    • General reptiles: crawling a whole page of data capture system (crawler).

    • Focused crawler: crawling local data page must be based on common crawler.

    • Incremental reptiles: the site to monitor the situation in order to update the data to a website crawling out of the latest update data.

  • Risk Analysis

    • Rational use

    • Reptile risk reflects:

      • Reptile interfere with the normal operation of the site is accessed;

      • Reptiles crawl certain types of legal protection of data or information.

    • avoid risk:

      • Strict compliance with robots protocol site settings;

      • While the anti-circumvention measures reptiles, need to optimize your code, to avoid interference with the normal operation of the site is accessed;

      • In use, the dissemination of information to crawl, crawl should review the contents, if found personal information belonging to the user's privacy or trade secrets of others, should be promptly stopped and deleted.

  • Anti-climbing mechanism

  • Anti-anti-climbing strategy

  • robots.txt protocols: the text protocol, you can specify the non-crawling and climbing data described in the text.

Guess you like

Origin www.cnblogs.com/doraemon548542/p/11964356.html