python-保存cookie信息/urllib模块里面的异常/url解析模块

1. cookie信息是什么?

cookie某些网站为了辨别用户身份, 只有登陆某个页面才可以访问;
登陆信息保存方式: 进行一个会话跟踪(session),将用户的相关信息保存到本地的浏览器中;

from collections import Iterable
from urllib.parse import urlencode
from urllib.request import  HTTPCookieProcessor
from http import cookiejar
from urllib import request

#  **************************1. 获取cookie信息保存到变量**********************
# # CookieJar ------> FileCookieJar  ---> MozilaCookie
# # 1. 声明一个类, 将cookie信息保存到变量中;
# cookie = cookiejar.CookieJar()
#
# # 2. 通过urllib.request的 HTTPCookieProcessor创建cookie请求器;
# handler = HTTPCookieProcessor(cookie)
#
# # 3). 通过处理器创建opener; ==== urlopen
# opener = request.build_opener(handler)
#
# # 4). 打开url页面
# response = opener.open('http://www.baidu.com')
#
# # print(cookie)
# print(isinstance(cookie, Iterable))
# for item in cookie:
#     print("Name=" + item.name, end='\t\t')
#     print("Value=" + item.value)


#  **************************2. 获取cookie信息保存到本地文件**********************

# # 1). 指定年cookie文件存在的位置;
# cookieFilenName = 'doc/cookie.txt'
#
# # 2). 声明对象MozillaCookieJar, 用来保存cookie到文件中;
# cookie = cookiejar.MozillaCookieJar(filename=cookieFilenName)
#
#  # 3). 通过urllib.request的 HTTPCookieProcessor创建cookie请求器;
# handler = HTTPCookieProcessor(cookie)
#
# # 4). 通过处理器创建opener; ==== urlopen
# opener = request.build_opener(handler)
#
# response = opener.open('http://www.baidu.com')
# print(response.read().decode('utf-8'))
# # 保存到本地文件中;
# cookie.save(cookieFilenName)


#  **********************************3. 从文件中获取cookie并访问********************************

# 1). 指定cookie文件存在的位置;
cookieFilenName = 'doc/cookie.txt'

# 2). 声明对象MozillaCookieJar, 用来保存cookie到文件中;
cookie = cookiejar.MozillaCookieJar()


# *****添加一步操作, 从文件中加载cookie信息
cookie.load(cookieFilenName)

 # 3). 通过urllib.request的 HTTPCookieProcessor创建cookie请求器;
handler = HTTPCookieProcessor(cookie)

# 4). 通过处理器创建opener; ==== urlopen
opener = request.build_opener(handler)

response = opener.open('http://www.baidu.com')
print(response.read().decode('utf-8'))


# **********************************4. 利用cookie模拟登陆网站的步骤**********************************


#  *******************88模拟登陆, 并保存cookie信息;
cookieFileName = 'cookie01.txt'
cookie = cookiejar.MozillaCookieJar(filename=cookieFileName)
handler = HTTPCookieProcessor(cookie)
opener = request.build_opener(handler)
#  这里的url是教务网站登陆的url;
loginUrl = 'xxxxxxxxxxxxxx'
postData = urlencode({
    'stuid': '1302100122',
    'pwd': 'xxxxxx'
})
response = opener.open(loginUrl, data=postData)
cookie.save(cookieFileName)

# bs4
# ******************8根据保存的cooie信息获取其他网页的内容eg: 查成绩/选课
gradeUrl = ''
response = opener.open(gradeUrl)
print(response.read())

案例

from http import cookiejar
from urllib import request
from urllib.parse import urlencode
from urllib.request import HTTPCookieProcessor
cookieFileName = 'doc/chinaUnixCookie.txt'
cookie = cookiejar.MozillaCookieJar(filename=cookieFileName)
handler = HTTPCookieProcessor(cookie)
opener = request.build_opener(handler)
#  这里的url是chinaunix登陆的url;
loginUrl = 'http://bbs.chinaunix.net/member.php?mod=logging&action=login&loginsubmit=yes&loginhash=La2A2'

# 易错: POST data should be bytes, an iterable of bytes, or a file object.
postData = urlencode({
    'username': 'LVah',
    'password': 'gf132590'
}).encode('utf-8')

print(type(postData))
response = opener.open(loginUrl, data=postData)
print(response.code)
with open('doc/chinaunix.html', 'wb') as f:
    f.write(response.read())
# cookie.save(cookieFileName)

urllib模块里面的异常

pyhton3中把urllib2里面的方法封装到urllib.request;
https://docs.python.org/3/library/urllib.html

HTTP常见的状态码有哪些:

  • 2xxx: 成功
  • 3xxx: 重定向
  • 4xxx: 客户端的问题
  • 5xxxx: 服务端的问题

例如:

  • 404: 页面找不到
  • 403: 拒绝访问
  • 200: 成功访问

1.消息

  • 100 Continue
  • 101 Switching Protocols
  • 102 Processing

2.成功

  • 200 OK
  • 201 Created
  • 202 Accepted
  • 203 Non-Authoritative Information
  • 204 No Content
  • 205 Reset Content
  • 206 Partial Content
  • 207 Multi-Status

3.重定向

  • 300 Multiple Choices
  • 301 Moved Permanently
  • 302 Move temporarily
  • 303 See Other
  • 304 Not Modified
  • 305 Use Proxy
  • 306 Switch Proxy
  • 307 Temporary Redirect

4 请求错误

  • 400 Bad Request
  • 401 Unauthorized
  • 402 Payment Required
  • 403 Forbidden
  • 404 Not Found
  • 405 Method Not Allowed
  • 406 Not Acceptable
  • 407 Proxy Authentication Required
  • 408 Request Timeout
  • 409 Conflict
  • 410 Gone
  • 411 Length Required
  • 412 Precondition Failed
  • 413 Request Entity Too Large
  • 414 Request-URI Too Long
  • 415 Unsupported Media Type
  • 416 Requested Range Not Satisfiable
  • 417 Expectation Failed
  • 421 too many connections
  • 422 Unprocessable Entity
  • 423 Locked
  • 424 Failed Dependency
  • 425 Unordered Collection
  • 426 Upgrade Required
  • 449 Retry With
  • 451Unavailable For Legal Reasons

5.服务器错误

  • 500 Internal Server Error
  • 501 Not Implemented
  • 502 Bad Gateway
  • 503 Service Unavailable
  • 504 Gateway Timeout
  • 505 HTTP Version Not Supported(http/1.1)
  • 506 Variant Also Negotiates
  • 507 Insufficient Storage
  • 509 Bandwidth Limit Exceeded
  • 510 Not Extended
  • 600 Unparseable Response Headers
from urllib import request
from urllib import error
try:
    url = 'http://www.baidu.com/hello.html'
    response = request.urlopen(url, timeout=0.01)
except error.HTTPError as e:
    print(e.code, e.headers, e.reason)
except error.URLError as e:
    print(e.reason)
else:
    content = response.read().decode('utf-8')
    print(content[:5])

url解析模块

from urllib.parse import  urlencode
from urllib.parse import  urlparse

# data = urlencode({
#     'name': 'fentiao',
#     'password':'12345'
# })
# print(data)

# https://movie.douban.com/subject/4864908/comments?sort=time&status=P
# https://movie.douban.com/subject/4864908/comments?sort=new_score&status=P

# #**************** 对url地址进行编码
# data = urlencode({
#     'sort': 'time',
#     'status': 'P'
# })
# doubanUrl = 'https://movie.douban.com/subject/4864908/comments?' + data
# print(doubanUrl)

# #**************** 对url地址进行解析
doubanUrl = 'https://movie.douban.com/subject/4864908/comments?sort=new_score&status=P'
info = urlparse(doubanUrl)
print(info)
print(info.scheme)

猜你喜欢

转载自blog.csdn.net/qq_43273590/article/details/87797844
今日推荐