[Python reptile road day2]: ProxyHandler IP Proxy Agent crawling && cookie and use examples

ProxyHandler agent
at the time of writing reptiles often need to do a reverse proxy IP reptile
commonly used IP are:
West thorn free agents: xicidaili.com/nt/
fast proxy: http://kuaidaili.com/
agent cloud: HTTP: // dailiyun. com /
Here Insert Picture Descriptionview the agent's IP : http://www.httpbin.org/ip
website : http://www.httpbin.org/ can view some parameters of http.

# Check the current ip

from urllib import request,parse
url="http://httpbin.org/ip"
resp=request.urlopen(url)
print(resp.read())

Acting principles : first access the proxy server by using a proxy server to access the target site, and then returns the results to their visit.
Step:
1. ProxyHandler { "type": "IP: Port"} Create a hander
2. Create creates a handler using opener
3. opener transmission request using a # fact, it is a bottom urlopen such operation.
code show as below:

handler=request.ProxyHandler({"http":"112.95.205.49:8888"})
opener=request.build_opener(handler)
resp=opener.open(url)
print(resp.read())

Results:
B '{\ n-"Origin": "60.222.112.195" \ n-} \ n-' original # the IP
B '{\ n-"Origin": "60.222.112.195, 112.95.204.217" \ n-} \ n-' # proxy IP

cookie
在网站中,对服务器的使用往往需要认证,第一次访问服务器后,服务器返回一个cookie,以确保第二次访问无需认证。cookie一般不超过4kb。
代码如下,使用cookie可以实现登录账户。
方法一:在headers加入网页的cookie信息

aji_url="http://www.renren.com/973687886/profile"
headers=({"User-Agent":" Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.87 Safari/537.36",
"Cookie": "anonymid=k6hu8cnocon7sq; 删除部分代码39c126ca7%7C1581428091545%7C1%7C1581428091771; jebecookies=6f157d36-8a56-4d80-b00e-5b56897c858e|||||; t=af9ce0986e484e427bb7eb4c8e9e3ed56; societyguester=af9ce0986e484e427bb7eb4c8e9e3ed56; xnsid=c90db889; loginfrom=null; wp_fold=0"
})
req=request.Request(url=aji_url,headers=headers)
resp=request.urlopen(req)
print(resp.read().decode("utf-8"))

方法二

from http.cookiejar import CookieJar
headers = ({
“User-Agent”: " Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.87 Safari/537.36"})
def get_opener():
# 创建一个cookiejar
cookiejar = CookieJar()
# 使用cookiejar创建一个HTTPCookieProcessor对象
handler = request.HTTPCookieProcessor(cookiejar)
# 使用handler创建一个opener
opener = request.build_opener(handler)
return opener
def login_renren(opener):
#登录人人网

data = ({"email": "13537703610",
     "password": "510548134ys"})
login_url = "http://www.renren.com/SysHome.do"
req = request.Request(url=login_url, data=parse.urlencode(data).encode("utf-8"), headers=headers)
opener.open(req)

def visit_renrne(opener):
# 访问个人网页
aji_url = “http://www.renren.com/973687886/profile”
req = request.Request(aji_url, headers=headers) # 使用之前新建的opener,已经有登录信息
resp = opener.open(req)
with open(r"C:\python38\new project\mydi\ren.txt", “w”, encoding=“utf-8”)as fp:
fp.write(resp.read().decode(“utf-8”))
if name == ‘main’:
opener=get_opener()
login_renren(opener)
visit_renrne(opener)
cookie之保存 cookie保存到本地****可以方便再次查看
from urllib import request
from http.cookiejar import MozillaCookieJar

cookiejar = MozillaCookieJar(“cookie.txt”)
handler = request.HTTPCookieProcessor(cookiejar)
opener = request.build_opener(handler)

resp=opener.open(“https://www.baidu.com/”)
cookiejar.save()

Released five original articles · won praise 1 · views 184

Guess you like

Origin blog.csdn.net/dinnersize/article/details/104260972