Small climb 1: jupyter simple concepts using && reptiles

1.jupyter basic usage

Two modes: code and markdown

(1) code mode can write the code directly py

(2) markdown styles can be specified directly

(3) Double-click to re-edit

(4) Shortcuts Summary:

Insert cell: ab 
delete cell: x 
switch cell model: my 
execution cell: the Shift + the Enter 
the Tab: auto-complete 
the Shift + the Tab: Opens the help file

(5) ipynb files in the cache is equivalent to, in no particular order. Caching mechanism

 

2. The second open anaconda way:

(1) FIG. 1

(2) 2 in FIG.

(3) in FIG. 3, FIG lower two paths, is also turned on the browser content

Open the top, you do not need to configure environment variables.

2. Basic concepts: http review

1. What is a reptile?

We used a lot: is the browser itself

The concept: by writing a program to simulate the Internet browser, let go the process of obtaining data on the Internet.

2. Classification of reptiles

(1) General Reptile: get a whole page of data, such as Baidu, 360, Sogou browser (behind a set of gripping system)

(2) Focus crawler: obtaining local data page specified according to the specified requirements

(3) Incremental reptiles: to monitor the situation site data update, crawling out of the latest updates to the site data

(4) distributed reptiles: after completing explain scrapy, and then comes to

 3. Anti-climb nature

Anti-climb mechanism: the site can take the relevant technical means or strategies to block crawler program website crawling data

Anti-anti-climbing strategy: Let the crawlers through the crack anti-climb mechanism to get data

 4. Agreement

(1) robots protocol (can not comply with): an anti-climb agreement, specify which data can climb, which can not climb, both sides must abide by the job.

  Anti-anti-villain is not a gentleman's agreement

  https://www.taobao.com/robots.txt

(2) http protocol (Hypertext Transfer Protocol): client and server be in the form of data exchange (must be good at summing up)

https protocol: http security

 In fact, during the data exchange between people.

- use the header information to 
  request headers:
  - User-Agent: Request carrier identity (browser or crawler will do, by reptile camouflage)
  For example, we installed the Google browser, and our visit is Baidu, vector request is "Google Chrome"
  - Connection: Keep-Alive or Close
    Close properties: after successful request, the request will immediately disconnect the link corresponding
    keep-alive; after successful request, the request corresponding link will be disconnected, but not immediately disconnect
  response headers:
  --content-of the type: can be json or text or js, action: Note the server response back to the client data format or data type.

5.

https: secure http protocol

Certificate encryption keys?

Before understanding the encryption on top of that we first understand the "symmetrical secret key encryption", "asymmetric secret key encryption"

A preliminary understanding to

Three protection modes: certificate secret key encryption, symmetric encryption keys, asymmetric key encryption

(1) SSL encryption:

SSL encryption technology employed is called "Shared Key", also called "symmetrical secret key encryption."

Cons: Once-party interception, it will be to crack the secret key and public key cipher can be cracked

(2) asymmetric encryption

Disadvantages: (1) efficiency is relatively low, (2) the client does not know is the public key is not sent by the server.

(3) certificates secret key encryption: the capture of an asymmetric encryption secret key issues

Tripartite bodies: Certification Authority

 

 

 Reference blog: https://www.cnblogs.com/bobo-zhang/p/9645715.html

 

Guess you like

Origin www.cnblogs.com/studybrother/p/10932034.html
Recommended