一、requests概述

非转基因的python HTTP模块
发送http请求，获取响应数据
安装 pip/pip3 install requests

二、发送GET

请求

1. requests发送get请求

import requests

url = "https://www.baidu.com"

# 发送get请求
response = requests.get(url)


print(response.text)

在这里插入图片描述

2. response 响应对象

response.text

类型: str
解码类型：requests模块自动根据HTTP头部对响应的编码作出有根据的推测，推测的文本编码

response.content

类型：bytes
解码类型：没有指定

2.1 解决中文乱码

对response.content进行decode，解决中文乱码
response.content.decode()默认utf-8

import requests

url = "https://www.baidu.com"

# 发送get请求
response = requests.get(url)
response.encoding = 'utf8'


print(response.content)
print(response.content.decode())

在这里插入图片描述

2.2 response响应对象的属性、方法

import requests

url = "https://www.baidu.com"

# # 发送get请求
response = requests.get(url)
response.encoding = 'utf8'

# 响应url
print(response.url)
# https://www.baidu.com/

# 状态码
print(response.status_code) # 200

# 请求头
print(response.request.headers)
# {'User-Agent': 'python-requests/2.25.1',
#  'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*',
#  'Connection': 'keep-alive'}

# 响应头
print(response.headers)

# {'Cache-Control': 'private, no-cache, no-store,
#  proxy-revalidate, no-transform', 'Connection': 'keep-alive',
# 'Content-Encoding': 'gzip', 'Content-Type': 'text/html',
# 'Date': 'Mon, 01 Nov 2021 13:51:59 GMT',
# 'Last-Modified': 'Mon, 23 Jan 2017 13:24:18 GMT',
# 'Pragma': 'no-cache', 'Server': 'bfe/1.0.8.18',
# 'Set-Cookie': 'BDORZ=27315; max-age=86400;
# domain=.baidu.com; path=/', 'Transfer-Encoding': 'chunked'}


# 答应响应设置cookie
print(response.cookies)
# <RequestsCookieJar[<Cookie BDORZ=27315 for .baidu.com/>]>

3. requests发送header请求

requests.get(url,headers=headers)

headers 参数接收字典形式的请求头
key请求头字段，value字段对应值

import requests

url = 'https://www.baidu.com'

# 定制请求字典
headers = {
    
    
    'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.54 Safari/537.36 Edg/95.0.1020.40'

}
# 发送（ 披着羊皮的狼——伪装浏览器
response = requests.get(url,headers= headers)
print(response.request.headers)

三、cookie

1. requests发送header请求携带cookie

在这里插入图片描述

title里有.github：登录账号不成功
在这里插入图片描述

import requests

url = 'https://github.com/M1kaelson'



headers = {
    
    
    'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:93.0) Gecko/20100101 Firefox/93.0'
}
temp = ''
temp=temp.encode("utf-8").decode("latin-1")
cookie_list = temp.split(';')
cookies = {
    
    }

for cookie in cookie_list:
    # cookies是一个字典，里面的值是
    cookies[cookie.split('=')[0]] = cookie.split('=')[-1]
print(cookies)

response = requests.get(url , headers=headers,cookies=cookies,timeout=60)

with open("github_with_cookies_.html","wb")as f:
    f.write(response.content)

（之前不成功，是因为校园网。。。。4G网直接ok

2. cookiejar对象

构建cookies字典
cookies = {"name":"value"}

import requests

url = ''



headers = {
    
    
    'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:93.0) Gecko/20100101 Firefox/93.0'
}
temp = ''

cookie_list = temp.split(';')
cookies = {
    
    }

for cookie in cookie_list:
    # cookies是一个字典，里面的值是
    cookies[cookie.split('=')[0]] = cookie.split('=')[-1]
print(cookies)

response = requests.get(url , headers=headers,cookies=cookies)

with open("github_with_cookies_.html","wb")as f:
    f.write(response.content)

四、参数

1. timeout

response = requests.get (url,timeout=3)

2. proxy

在这里插入图片描述

正向
反向（nginx 浏览器不知道服务器真实的地址

3. 协议

http、https
socks

socks只是简单的传递数据包，不关心应用层协议
socks费时比http https少
socks代理可以转发http https请求

response= requests.get(url,proxies=proxies)
# proxies的形式：字典

import requests

url = 'http://www.google.com'
proxies = {
    
    
    'http':'http://ip:port'
}

response = requests.get(url,proxies=proxies)
print(response.text)

4. verify

使用verify忽略CA证书
为了在代码中正常请求，使用verify=false参数，此时requests模块发送请求将不做CA证书

response = request.get(url,verify=False)

五、发送POST请求

response= request.post(url,data)
data参数接收一个字典
其他参数和get的参数一致

1. 发送post包，实现金山单词翻译

# url
# headers
# data字典
# 发送请求 获取响应
# 数据解析

1.1 抓包确定请求URL

在这里插入图片描述

1.2 确定请求参数

在这里插入图片描述
下面这个是有道词典的，应该是反爬。。。搞不来

1.3 确定返回数据的位置

1.4 模拟浏览器获取数据

终于成功了
在这里插入图片描述

#coding:utf-8
import requests

import json


class King(object):

    def __init__( self,word ):
        # url
        self.url = "http://ifanyi.iciba.com/index.php?c=trans&m=fy&client=6&auth_user=key_ciba&sign=37218aa29f55fdcc"
         # headers
        self.headers = {
    
    
            'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:93.0) Gecko/20100101 Firefox/93.0'
        }
        # data

        self.data ={
    
    

            "from": "zh",
            "to": "en",
            "q": word


        }


    def get_data(self):
        response =requests.post(self.url,data=self.data,headers=self.headers)
        return response.content.decode('unicode-escape')
   

    def run(self):
        response = self.get_data()
        print(response)
     

if __name__ == '__main__':
    King = King('字典')
    King.run()

    # with open("fanyi.html", "wb")as f:
    #     f.write(Youdao.run())

2. 数据来源

固定值 ——抓包比较不变值
输入值
预设值——静态文件
预设值——发请求
在客户端生产——分析js，模拟生成数据

3. requests.session模块

自动处理cookie
下一次请求会带上前一次的cookie
用于连续的多次请求

session = requests.session()
response = session.get(url,headers,..)
response = session.post(url,data,...)

4. 用session保存会话，实现github登录

思路

	# session


    # headers

    # url1-获取token
        # 发送请求获取相应
        # 正则提取


    # url2-登录
        # 构建表单数据
        # 发送请求登录

import requests
import re
def login():
    # session
    session = requests.session()

    # headers
    session.headers = {
    
    
        'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.54 Safari/537.36 Edg/95.0.1020.40'

    }

    # url1-获取token
        # 发送请求获取相应
        # 正则提取

    url1 = 'https://github.com/login'
    res_1 = session.get(url1).content.decode()

    # 获取token
    token = re.findall('name="authenticity_token" value="(.*?)" />', res_1)[0]
    print(token)



    # url2-登录
        # 构建表单数据
        # 发送请求登录

    url2 = 'https://github.com/session'
    data = {
    
    
        'commit': 'Sign in',
        'authenticity_token':token,
        'login': 'M1kaelson',
        'password': 'xxxxxxx,
        # 'trusted_device':''
        'webauthn-support': 'supported',
        # 'webauthn-iuvpaa-support':'unsupported'
        # 'return_to':'https://github.com/signup?ref_cta=Sign+up&ref_loc=header+logged+out&ref_page=%2F&source=header-home'
        'allow_signup': '',
        # 'client_id':''
        # 'integration':''
        # 'required_field_771d':''
        # 'timestamp':'1636085932515'
        # 'timestamp_secret':'262637e02cba4e7372cde23e07019cee2b70afd16525d44e160b38b60bb4fc8f'

    }
    print(data)
    session.post(url2, data=data)



    # url3- 验证
    url3 = 'https://github.com/M1kaelson'
    response = session.get(url3)
    with open('github.html','wb')as f:
        f.write(response.content)

if __name__ == '__main__':
    login()

【python】——爬虫02 requests 模块

目录

一、requests概述

二、发送GET

1. requests发送get请求

2. response 响应对象

2.1 解决中文乱码

2.2 response响应对象的属性、方法

3. requests发送header请求

三、cookie

1. requests发送header请求携带cookie

2. cookiejar对象

四、参数

1. timeout

2. proxy

3. 协议

4. verify

五、发送POST请求

1. 发送post包，实现金山单词翻译

1.1 抓包确定请求URL

1.2 确定请求参数

1.3 确定返回数据的位置

1.4 模拟浏览器获取数据

2. 数据来源

3. requests.session模块

4. 用session保存会话，实现github登录

思路

猜你喜欢

【python】——爬虫02 requests 模块

目录

一、requests概述

二、发送GET

1. requests发送get请求

2. response 响应对象

2.1 解决中文乱码

2.2 response响应对象的属性、方法

3. requests发送header请求

三、cookie

1. requests发送header请求携带cookie

2. cookiejar对象

四、 参数

1. timeout

2. proxy

3. 协议

4. verify

五、发送POST请求

1. 发送post包，实现金山单词翻译

1.1 抓包确定请求URL

1.2 确定请求参数

1.3 确定返回数据的位置

1.4 模拟浏览器获取数据

2. 数据来源

3. requests.session模块

4. 用session保存会话，实现github登录

思路

猜你喜欢

四、参数