python-保存cookie信息/urllib模块里面的异常/url解析模块

1. cookie信息是什么?

cookie某些网站为了辨别用户身份，只有登陆某个页面才可以访问；
登陆信息保存方式: 进行一个会话跟踪(session),将用户的相关信息保存到本地的浏览器中;

from collections import Iterable
from urllib.parse import urlencode
from urllib.request import  HTTPCookieProcessor
from http import cookiejar
from urllib import request

#  **************************1. 获取cookie信息保存到变量**********************
# # CookieJar ------> FileCookieJar  ---> MozilaCookie
# # 1. 声明一个类， 将cookie信息保存到变量中;
# cookie = cookiejar.CookieJar()
#
# # 2. 通过urllib.request的 HTTPCookieProcessor创建cookie请求器；
# handler = HTTPCookieProcessor(cookie)
#
# # 3). 通过处理器创建opener; ==== urlopen
# opener = request.build_opener(handler)
#
# # 4). 打开url页面
# response = opener.open('http://www.baidu.com')
#
# # print(cookie)
# print(isinstance(cookie, Iterable))
# for item in cookie:
#     print("Name=" + item.name, end='\t\t')
#     print("Value=" + item.value)


#  **************************2. 获取cookie信息保存到本地文件**********************

# # 1). 指定年cookie文件存在的位置;
# cookieFilenName = 'doc/cookie.txt'
#
# # 2). 声明对象MozillaCookieJar, 用来保存cookie到文件中;
# cookie = cookiejar.MozillaCookieJar(filename=cookieFilenName)
#
#  # 3). 通过urllib.request的 HTTPCookieProcessor创建cookie请求器；
# handler = HTTPCookieProcessor(cookie)
#
# # 4). 通过处理器创建opener; ==== urlopen
# opener = request.build_opener(handler)
#
# response = opener.open('http://www.baidu.com')
# print(response.read().decode('utf-8'))
# # 保存到本地文件中;
# cookie.save(cookieFilenName)


#  **********************************3. 从文件中获取cookie并访问********************************

# 1). 指定cookie文件存在的位置;
cookieFilenName = 'doc/cookie.txt'

# 2). 声明对象MozillaCookieJar, 用来保存cookie到文件中;
cookie = cookiejar.MozillaCookieJar()


# *****添加一步操作, 从文件中加载cookie信息
cookie.load(cookieFilenName)

 # 3). 通过urllib.request的 HTTPCookieProcessor创建cookie请求器；
handler = HTTPCookieProcessor(cookie)

# 4). 通过处理器创建opener; ==== urlopen
opener = request.build_opener(handler)

response = opener.open('http://www.baidu.com')
print(response.read().decode('utf-8'))


# **********************************4. 利用cookie模拟登陆网站的步骤**********************************


#  *******************88模拟登陆， 并保存cookie信息;
cookieFileName = 'cookie01.txt'
cookie = cookiejar.MozillaCookieJar(filename=cookieFileName)
handler = HTTPCookieProcessor(cookie)
opener = request.build_opener(handler)
#  这里的url是教务网站登陆的url;
loginUrl = 'xxxxxxxxxxxxxx'
postData = urlencode({
    'stuid': '1302100122',
    'pwd': 'xxxxxx'
})
response = opener.open(loginUrl, data=postData)
cookie.save(cookieFileName)

# bs4
# ******************8根据保存的cooie信息获取其他网页的内容eg: 查成绩/选课
gradeUrl = ''
response = opener.open(gradeUrl)
print(response.read())

案例

from http import cookiejar
from urllib import request
from urllib.parse import urlencode
from urllib.request import HTTPCookieProcessor
cookieFileName = 'doc/chinaUnixCookie.txt'
cookie = cookiejar.MozillaCookieJar(filename=cookieFileName)
handler = HTTPCookieProcessor(cookie)
opener = request.build_opener(handler)
#  这里的url是chinaunix登陆的url;
loginUrl = 'http://bbs.chinaunix.net/member.php?mod=logging&action=login&loginsubmit=yes&loginhash=La2A2'

# 易错: POST data should be bytes, an iterable of bytes, or a file object.
postData = urlencode({
    'username': 'LVah',
    'password': 'gf132590'
}).encode('utf-8')

print(type(postData))
response = opener.open(loginUrl, data=postData)
print(response.code)
with open('doc/chinaunix.html', 'wb') as f:
    f.write(response.read())
# cookie.save(cookieFileName)

urllib模块里面的异常

pyhton3中把urllib2里面的方法封装到urllib.request;
https://docs.python.org/3/library/urllib.html

HTTP常见的状态码有哪些：

2xxx: 成功
3xxx: 重定向
4xxx: 客户端的问题
5xxxx: 服务端的问题

例如：

404：页面找不到
403：拒绝访问
200: 成功访问

1.消息

100 Continue
101 Switching Protocols
102 Processing

2.成功

200 OK
201 Created
202 Accepted
203 Non-Authoritative Information
204 No Content
205 Reset Content
206 Partial Content
207 Multi-Status

3.重定向

300 Multiple Choices
301 Moved Permanently
302 Move temporarily
303 See Other
304 Not Modified
305 Use Proxy
306 Switch Proxy
307 Temporary Redirect

4 请求错误

400 Bad Request
401 Unauthorized
402 Payment Required
403 Forbidden
404 Not Found
405 Method Not Allowed
406 Not Acceptable
407 Proxy Authentication Required
408 Request Timeout
409 Conflict
410 Gone
411 Length Required
412 Precondition Failed
413 Request Entity Too Large
414 Request-URI Too Long
415 Unsupported Media Type
416 Requested Range Not Satisfiable
417 Expectation Failed
421 too many connections
422 Unprocessable Entity
423 Locked
424 Failed Dependency
425 Unordered Collection
426 Upgrade Required
449 Retry With
451Unavailable For Legal Reasons

5.服务器错误

500 Internal Server Error
501 Not Implemented
502 Bad Gateway
503 Service Unavailable
504 Gateway Timeout
505 HTTP Version Not Supported(http/1.1)
506 Variant Also Negotiates
507 Insufficient Storage
509 Bandwidth Limit Exceeded
510 Not Extended
600 Unparseable Response Headers

from urllib import request
from urllib import error
try:
    url = 'http://www.baidu.com/hello.html'
    response = request.urlopen(url, timeout=0.01)
except error.HTTPError as e:
    print(e.code, e.headers, e.reason)
except error.URLError as e:
    print(e.reason)
else:
    content = response.read().decode('utf-8')
    print(content[:5])

url解析模块

from urllib.parse import  urlencode
from urllib.parse import  urlparse

# data = urlencode({
#     'name': 'fentiao',
#     'password':'12345'
# })
# print(data)

# https://movie.douban.com/subject/4864908/comments?sort=time&status=P
# https://movie.douban.com/subject/4864908/comments?sort=new_score&status=P

# #**************** 对url地址进行编码
# data = urlencode({
#     'sort': 'time',
#     'status': 'P'
# })
# doubanUrl = 'https://movie.douban.com/subject/4864908/comments?' + data
# print(doubanUrl)

# #**************** 对url地址进行解析
doubanUrl = 'https://movie.douban.com/subject/4864908/comments?sort=new_score&status=P'
info = urlparse(doubanUrl)
print(info)
print(info.scheme)

python-保存cookie信息/urllib模块里面的异常/url解析模块

1. cookie信息是什么?

案例

urllib模块里面的异常

url解析模块

猜你喜欢