requests module uses two

1, cookies, and session

1.1 What is a cookie and session?

A cookie is used to identify the site user, session tracking, data stored on the local terminal.
session (session) up to the present is meant that the beginnings and ends the series of operations and messages.

In the web, session information is mainly used to store a particular server session user objects needed.

1.2, the reason cookie and session produced:

http protocol is a stateless protocol, when a particular operation, you need to save the information, and then generates a cookie and session.

1.3, cookie principle:

Is generated by the server, the browser first request, the server sends to the client and then saved, the browser continue to access, will be included with information on cookie cookie request header field, so that the server can identify who is visited.

But cookie defects:
1, unsafe - saved locally, vulnerable to tampering.
2, size is limited, itself the largest 4kb.

Although the cookie to solve the 'hold' needs to a certain extent, but we hope to have a new cookie technology can overcome the defects, this technology is the session.

1.4、session

session stored in the server. ---- resolve security issues.
The question is: session on the server, but the client sends a request over how the server knows session_a, session_b, in the end, and that corresponding to the request.
Therefore, in order to solve this problem: cookie on as the bridge. In sessionid cookie has a field corresponding to this request can be used to indicate which one of the server session.
Disabling cookie, under normal circumstances, session can not be used. You can use url rewriting exceptional circumstances to use session.
url rewriting: sessionid will be spliced into the url inside.

session life cycle: start server creates valid end (the site settings are usually about 30 minutes), it is deleted.

1.5, cookie common field
    "domain": ".baidu.com",   #主机名
    "expirationDate": 1607430402.780818, #过期时间(时间戳=>等价于Tue Dec  8 20:26:42 2020)
    "name": "BAIDUID", #字段名
    "path": "/", #主机哪个路径下设置的cookie
    "secure": false, 是否安全传输
    "value": "FC48EB3E32989D00F7BFCA9AF76D009A:FG=1", #字段值
    "id": 1 #cookiesid
1.6, the session cookie and persistent cookie

Session cookies:

"ExpirationDate" the value is negative, then when you close the browser cookie cease to be valid cookie stored in memory.

Persistent cookies

"ExpirationDate": 1607430402.780818, at the time of cookie when expirationDate failure. cookie stored on your hard disk

Persistence: The persistence of data in memory to the hard disk. In fact, save the data to a file or database.
Memory major role is due to the fast start the application software or program and they will allocate some memory space as the execution of the program.
Once the power failure memory will be cleared.

1.7, log on github page through cookies
The first embodiment of the cookies used in requests (the cookies is placed in the headers)
import requests

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36',
    'Cookie': '_octo=GH1.1.90071fdfg1199.1575898561; _device_id=78c6d6dc75b0d87dg7afb73742a9679aa9; _ga=GA1.2.1521343976.1575898582; has_recent_activity=1; tz=Asia%2FShanghai; experiment:homepage_signup_flow=eyJ2ZXJzaW9uIjoiMSIsInJvbGxPdXRQbGFjZW1lbnQiOjM1LjE2NzQ2NjU3Nzg3OTk1Niwic3ViZ3JvdXAiOm51gdfbGwsImNyZWF0ZWRBdCI6IjIwMjAtMDMtMThUMDI6MDQ6MjMuODA4WiIsInVwZGF0ZWRBdCI6IjIwMjAtMDMtMjVUMDQ6MTk6MTUuNjA5WiJ9; user_session=jk2HfEj_4gWfU8V_4dvps7naVFFB_R0aUn4_mVrdxnQWy9u7; __Host-user_session_same_site=jk2HfEj_4gWfU8V_4dvps7naVFFB_R0aUn4_mVrdxnQWy9u7; logged_in=yes; dotcom_user=2829507692; _gh_sess=x0LIdSUTKSwhKICq0QYKsXlDJULU%2BjhBdrK8NzsFYwNIUOHV0gYsMgjLZ%2B8YgTBoSp%2FjMZXHhuMfdD4dr7sqV%2F1IF%2BA0GnYV6J%2BsyyoMxpoz%2BVOdYGWqUi4p5KOIxm6VAOyePx6fEM21VAGMdEqq9EXErfH0n3j81aFFQeKYv3TgkEWqpYbD6DNKWrUOB%2FrbwmyND0z%2BdjrTLDE6YbZnwZwEoaiIZUhd6Rd265uMBD%2B38uNc7jHQRl1TVrfPwHTHXi%2BfE4LSzMOZalVQgoNlGxi0o0QXwa%2F12pt97CFIUjnSIqYwBEEVkl81pVBhjVzkKWQsKLPaSArMJ3wDPBKHegQCRLZ3u2lMa2%2BZTLKnOsA20%2BX9Z%2B2DdRPL9UHusnUME%2F77KC4V%2FNrHMKnPkLC2dpZFojKSZscTpjBB4PeYF4KV84sMOMrxQlHetWDKN8%2BcjSuHPOGdf2qHq67AbyaRsoyFK3T2cQocIqJh9pYEQwmnm%2FasWoP82PsAn%2FbujQY%2BzJq1iVWxVUCH0nc5mShuwJg0AJBjAiaFwSvCjWF5eUFYmKTA9Bk%2FpH254MbDb8ofJsWJs6m1oycxEuJogMHkxZuIrWGIQezG%2BBYODyTJOco4oGRp11dVTPZLXTj3Vet0pwvL749B8hfmskK%2F4w2mvErIoxxe6yy1u0kgwgq4xIcgwmsdfg0I7ZUQC6ak0cGNfOeomjoTSPNLT0DZoUFW4WSjhiGgxIiUowPFB9PQ9FCvxrw8%2FyV0foedzTKQVZUnru7Ik4XkYpKakmfzZRPn5soED4XFgJVmzTB4xVxQSHNzsDgwBWupZYxE5dcYitWVr3OrMYAeqzpc%2BVbEs8%2BtWNpKDlJDOMYQfsqwGexThnNYlj2IvnIz0aQydEva%2FNcZNlHOiEaD%2BN8xdMyqd2GTydT1LjkizWv4SofUNjroCj3mgggH2hCcOwRUhcYUJZt8Z6gW7JYiLCxQqnjVrJBc%2Bw2dgg1scohQhZ6PXkk5IqvX3p2mP2Xq967TEqr4bqntFcARsTUM4bT54D81XHiS6g1Rue0QmXu3pxe%2BEiMpoptbMRcQFsPQqE8PhweZKCL9ZOW0637rmt4qVNuQdvfGi9N2Bs9aO4PjDk%2Brrj5tWhSSzdCRUbYWzhtNj9CXImyQZoTZ7aPKrgRd2tjiEl5l61uPfdAvd%2FXv4dKxXbUlIb3TLhNsmJExDgdgfj7CoSiysaKwluM4D2Ggbk9xqtvSI9g57PweEA%2BhoxnjZKI0It65ZgZ5bIX7ipEEAo14uLPsqGut6ittBEp1UT5xELjUaM6LPto%2F9IPLlG4U8mdb3u7xx%2BuTakRu7yfbabDUkv0QXY9nsg3u27eTg097WErlgeB7jqf%2Flc%2Ft8laI%2B641xqA2kKKcr38PYwrkbySJyf5TMjgLop6qFCo738w%2B8J409OIEqNREN0IoW0DrfELoHmAkhMAYyzY2Yynbs3973wyyOTZUwNvm4K0mQQ%3D%3D--07oaTkbSmNfGWujT--xcmABMjlD5SJ9u8LnSFp%2Bwdsf%3D%3D'
}
url = 'https://github.com/'

res = requests.get(url, headers=headers).content.decode('utf-8')
if 'username' in res:
    print('login True')
The first embodiment of the cookies used in requests (the cookies in cookies placed parameter)
import requests

cookies='_octo=GH1.1.900711199.1575898561; _device_id=78c6d6dc75b0d877afb73742a9679aa9; _ga=GA1.2.1521343976.1575898582; has_recent_activity=1; tz=Asia%2FShanghai; experiment:homepage_signup_flow=eyJ2ZXJzaW9uIjoiMSIsInJvbGxPdXRQbGFjZW1lbnQiOjM1LjE2NzQ2NjU3Nzg3OTk1Niwic3ViZ3JvdXAiOm51bGwsImNyZWF0ZWRBdCI6IjIwMjAtMDMtMThUMDI6MDQ6MjMuODA4WiIsInVwZGF0ZWRBdCI6IjIwMjAtMDMtMjVUMDQ6MTk6MTUuNjA5WiJ9; user_session=jk2HfEj_4gWfU8V_4dvps7naVFFB_R0aUn4_mVrdxnQWy9u7; __Host-user_session_same_site=jk2HfEj_4gWfU8V_4dvps7naVFFB_R0aUn4_mVrdxnQWy9u7; logged_in=yes; dotcom_user=2829507692; _gh_sess=x0LIdSUTKSwhKICq0QYKsXlDJULU%2BjhBdrK8NzsFYwNIUOHV0gYsMgjLZ%2B8YgTBoSp%2FjMZXHhuMfdD4dr7sqV%2F1IF%2BA0GnYV6J%2BsyyoMxpoz%2BVOdYGWqUi4p5KOIxm6VAOyePx6fEM21VAGMdEqq9EXErfH0n3j81aFFQeKYv3TgkEWqpYbD6DNKWrUOB%2FrbwmyND0z%2BdjrTLDE6YbZnwZwEoaiIZUhd6Rd265uMBD%2B38uNc7jHQRl1TVrfPwHTHXi%2BfE4LSzMOZalVQgoNlGxi0o0QXwa%2F12pt97CFIUjnSIqYwBEEVkl81pVBhjVzkKWQsKLPaSArMJ3wDPBKHegQCRLZ3u2lMa2%2BZTLKnOsA20%2BX9Z%2B2DdRPL9UHusnUME%2F77KC4V%2FNrHMKnPkLC2dpZFojKSZscTpjBB4PeYF4KV84sMOMrxQlHetWDKN8%2BcjSuHPOGdf2qHq67AbyaRsoyFK3T2cQocIqJh9pYEQwmnm%2FasWoP82PsAn%2FbujQY%2BzJq1iVWxVUCH0nc5mShuwJg0AJBjAiaFwSvCjWF5eUFYmKTA9Bk%2FpH254MbDb8ofJsWJs6m1oycxEuJogMHkxZuIrWGIQezG%2BBYODyTJOco4oGRp11dVTPZLXTj3Vet0pwvL749B8hfmskK%2F4w2mvErIoxxe6yy1u0kgwgq4xIdacgwm0I7ZUQC6ak0cGNfOeomjoTSPNLT0DZoUFW4WSjhiGgxIiUowPFB9PQ9FCvxrw8%2FyV0foedzTKQVZUnru7Ik4XkYpKakmfzZRPn5soED4XFgJVmzTB4xVxQSHNzsDgwBWupZYxE5dcYitWVr3OrMYAeqzpc%2BVbEs8%2BtWNpKDlJDOMYQffasdfsqwGexThnNYlj2IvnIz0aQydEva%2FNcZNlHOiEaD%2BN8xdMyqd2GTydT1LjkizWv4SofUNjroCj3mgggH2hCcOwRUhcYUJZt8Z6gW7JYiLCxQqnjVrJBc%2Bw2dgg1scohQhZ6PXkk5IqvX3p2mP2Xq967TEqr4bqntFcARsTUM4bT54D81XHiS6g1Rue0QmXu3pxe%2BEiMpoptbMRcQFsPQqE8PhweZKCL9ZOW0637rmt4qVNuQdvfGi9N2Bs9aO4PjDk%2Brrj5tWhSSzdCRUbYWzhtNj9CXImyQZoTZ7aPKrgRd2tjiEl5l61uPfdAvd%2FXv4dKxXbUlIb3TLhNsmJExDj7CoSiysaKwluM4D2Ggbk9xqtvSI9g57PweEA%2BhoxnjZKI0It65ZgZ5bIX7ipEEAo14uLPsqGut6ittBEp1UT5xELjUaM6LPto%2F9IPLlG4U8mdb3u7xx%2BuTakRu7yfbabDUkv0QXY9nsg3u27eTg097WErlgeB7jqf%2Flc%2Ft8laI%2B641xqA2kKKcr38PYwrkbySJyf5TMjgLop6qFCo738w%2B8J409OIEqNREN0IoW0DrfELoHmAkhMAYyzY2Yynbs3973wyyOTZUwNvm4K0mQQ%3D%3D--07oaTkbSmNfGWujT--xcmABMjlD5SJ9u8LnSFp%2Bw%3D%3D'
cookies_dic={item.split('=')[0]:item.split('=')[1]for item in cookies.split('; ') }
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36',
}
url = 'https://github.com/'

res = requests.get(url, headers=headers,cookies=cookies_dic).content.decode('utf-8')
if 'username' in res:
    print('login True')
1.9, by constructing + Session data dictionary objects, log github
1.9.1 (analysis URL https://github.com/session ) understood, the following parameters need to pass sending post request.
commit: Sign in
authenticity_token: 9Ek9mN5qMK7Mr3DS3goFGj1zJcsywLXMA6eLMoZ3fLaW8z4SXuc6Qugv6H2/gyvfjAVOaqxpxHKvCVkUQa1SZg==
login: username
password: password
webauthn-support: supported
webauthn-iuvpaa-support: supported
return_to: 
required_field_6c85: 
timestamp: 1585111598578
timestamp_secret: 79735a7473e5c621225663e9f306bc3f1ea61c3926869e33200415096b3350c6

But if we pass the Session object, first access the login page ( https://github.com/login ), the session will be saved as part of the object data cookies, then we only need to pass the following few parameters.

commit: Sign in
authenticity_token: 91zJcsywLXMA6eLMoZ3fLaW8z4SXuc6Qugv6H2/gyvfjAVOaqxpxHKvCVkUQa1SZg==
login: username
password: password

And authenticity_token can obtain this parameter by accessing the login page in the page, to build data dictionary to complete the login.

1.9.2, as follows
import requests
import re

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36',
    'Referer': 'https://github.com/',
}
base_url = 'https://github.com/login'
request = requests.Session()
res = request.get(base_url, headers=headers).content.decode('utf-8')
data = {
    'commit': 'Sign in',
    'authenticity_token': re.search(r'<input type="hidden" name="authenticity_token" value="(.*?)" />', res,re.S).group(1),
    'login': 'username',
    'password': 'password',
}

login_url = 'https://github.com/session'
res = request.post(login_url, data=data)
# with open('./1.html', 'w', encoding='utf-8') as file: #可打开这个页面,查看是否登录成功。
#     file.write(res.text)
if 'username' in res.text: #也可通过关键字,查看是否登录成功。
    print('login True')

Second, agents

2.1, the agent role
  • Break their own IP access restrictions, access to some sites not normally accessible, such as the walls of some foreign websites, and even the dark net.
  • Some units or groups to access internal resources: for example, using the education network addresses free proxy server, it can be used for all kinds of open education network FTP download upload, and share all kinds of information inquiries and other services.
  • Improve access speed: Proxy servers are usually set up a large hard disk buffer, when passed outside information, but also save it to buffer, when other users access the same information directly from the buffer Che fetch information to the user, in order to improve the access speed.
  • Hide the real IP: Internet users can also hide your IP in this way, reducing some trouble. For reptiles, the agency that we use in order to hide their own IP, to prevent their being blocked.
2.2, IP classification
2.2.1, according to the agreement
  • FTP proxy server: mainly used for access to FTP servers generally have upload, download and cache function, the port is usually 21, 2121 and so on.
  • HTTP proxy server: mainly used to access the Web pages, the content filtering and caching, the port is usually 80, 8080, 3128 and so on.
  • SSL / TLS Proxy: mainly used to access the encrypted site, generally have SSL or TLS encryption, port, usually 443.
  • RTSP proxy: mainly used to access the Real streaming media server, generally cache function, the port is usually 554.
  • Telnet proxy: mainly used for telnet remote control (computer hacker invasion is often used to hide the identity), the port is usually 23
  • POP3 / SMTP proxy: mainly used for POP3 / SMTP mail mode, generally cache function, typically 110 ports / 25.
  • SOCKS proxy: simply pass packets do not care about the specific protocol and usage, so a lot of speed, generally cache function, the port is usually 1080. SOCKS proxy protocol is divided into SOCKS4 and SOCKS5, the former only supports TCP, while the latter supports TCP and UDP, also supports a variety of authentication mechanisms, such as domain name resolution server. In simple terms, SOCKS4 SOCKS5 can do can be done, but can do SOCKS4 SOCKS5 not be able to do it.
2.2.2, the degree of anonymity in accordance with points
  • High anonymous proxy: intact packets will be forwarded to the proxy server IP as a real ip, highest security. Packet as follows:

    • REMOTE_ADDR =代理IP
      
      HTTP_VIA = 没数值或不显示
      
      HTTP_X_FORWARDED_FOR =没数值或不显示
      
  • Anonymous Proxy: Use proxy server know you are, but do not know your real IP. Packet is shown below

    • REMOTE_ADDR =最后一个代理服务器IP
      
      HTTP_VIA = 代理服务器IP
      
      HTTP_X_FORWARDED_FOR=途经代理IP
      
  • Transparent Proxy: Use proxy server know you are, but also to know your real ip. Packet follows

    • REMOTE_ADDR =最后一个代理服务器IP
      
      HTTP_VIA = 代理服务器IP
      
      HTTP_X_FORWARDED_FOR=真实IP,途经代理IP
      
2.3, agent acquisition
  • 1, the proxy server vendor to purchase, usually provides the interface, easy to use, high efficiency.
  • 2, take their own climb from the Internet, because it is free, so inefficient.
2.4, the agent uses

I am here https://www.xicidaili.com/nn/ from west thorn agency official website, copy a few.

1, the use of non-encrypted proxy

import requests

proxies = {
  "http": 'http://218.75.158.153:3128',
  "https": "http://116.113.27.170:47849",
}
url = 'http://httpbin.org/get'
res = requests.get(url, proxies=proxies)

print(res.text)

{
  "args": {}, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Cache-Control": "max-age=259200", 
    "Host": "httpbin.org", 
    "User-Agent": "python-requests/2.22.0", 
    "X-Amzn-Trace-Id": "Root=1-5e7c389a-3cc713099f9a75d856987e3c"
  }, 
  "origin": "218.75.158.153",  #在此,我们看到了上面的代理地址
  "url": "http://httpbin.org/get"
}

2, the use of encryption agent, using methods similar to the above.

#我们需要加用户名与密码,格式如此
proxies = {
    "http": "http://user:[email protected]/", 
}

Guess you like

Origin www.cnblogs.com/hjnzs/p/12574456.html