get和post获取数据的基本知识//python爬虫之requests的基本使用

参考：https://www.cnblogs.com/lei0213/p/6957508.html
1.get请求返回的几种编码格式

import requests
response  = requests.get("https://www.baidu.com")
print(type(response))  #<class 'requests.models.Response'>
print(response.status_code) #200
print(type(response.text))  #<class 'str'>
response.enconding = "utf-8'
print(response.text） #一长串东西...中文编码乱
print(response.cookies)<RequestsCookieJar[<Cookie BDORZ=27315 for .baidu.com/>]>
print(response.content)#一长串东西...中文编码为（ascii编码），
print(response.content.decode("utf-8"))#一长串东西...中文正常显示（utf-8编码）
注释：response.text返回的是Unicode格式，通常需要转换为utf-8格式，否则就是乱码。response.content是二进制模式，可以下载视频之类的，如果想看的话需要decode成utf-8格式。不管是通过response.content.decode("utf-8)的方式还是通过response.encoding="utf-8"的方式都可以避免乱码的问题发生

2、get请求

import requests
url = 'https://www.baidu.com/'
response = requests.get(url)
print(response.text)
结果：<!DOCTYPE html>
<!--STATUS OK--><html> <head><meta http-equiv=content-type content=text/html;charset=utf-8><meta http-equiv=X-UA-Compatible content=IE=Edge><meta content=always name=referrer><link rel=stylesheet type=text/css href=https://ss1.bdstatic.com/5eN1bjq8AAUYm2zgoY3K/r/www/cache/bdorz/baidu.min.css><title>...... <img src=//www.baidu.com/img/gs.gif> </p> </div> </div> </div> </body> </html>

3、带参数的GET请求：
　　如果想查询http://httpbin.org/get页面的具体参数，需要在url里面加上，例如我想看有没有Host=httpbin.org这条数据，url形式应该是http://httpbin.org/get?Host=httpbin.org
　　下面提交的数据是往这个地址传送data里面的数据。
　　get带参数与post带参数区别：
　　个人理解第一版：get中的参数相当于是将url中添加输入要满足的条件；而post中的参数则是限定url输出中要取的特定路径的字段；
　　个人理解第一版更正：通过后面，可以得出，1)get跟post参数都是使得输出满足参数限定的形式，只是url网址最后面一个跟的是post,一个跟的是get;还有2)，post设定data,get设定params;

import requests

url = 'http://httpbin.org/get'
data = {
    'name':'zhangsan',
    'age':'25'
}
response = requests.get(url,params=data)
print(response.url)
print(response.text)
结果输出：
其中返回的url是这样的形式：url  "http://httpbin.org/get?age=25&name=zhangsan"
args    
age "25"
name    "zhangsan"
headers 
Accept  "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"
Accept-Encoding "gzip, deflate"
Accept-Language "zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2"
Connection  "close"
Host    "httpbin.org"
Upgrade-Insecure-Requests   "1"
User-Agent  "Mozilla/5.0 (Windows NT 6.3; Win64; x64; rv:60.0) Gecko/20100101 Firefox/60.0"
origin  "113.250.250.18"
url "http://httpbin.org/get?age=25&name=zhangsan"

（4）Json数据：
从下面的数据中我们可以得出，如果结果：
1、requests中response.json()方法等同于json.loads（response.text）方法

import requests
import json
response = requests.get("http://httpbin.org/get")
print(type(response.text))
print(response.json())
print(json.loads(response.text))
print(type(response.json())
输出：
<class 'str'>
{'headers': {'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'User-Agent': 'python-requests/2.19.1', 'Host': 'httpbin.org', 'Connection': 'close'}, 'url': 'http://httpbin.org/get', 'origin': '113.250.250.18', 'args': {}}
{'headers': {'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'User-Agent': 'python-requests/2.19.1', 'Host': 'httpbin.org', 'Connection': 'close'}, 'url': 'http://httpbin.org/get', 'origin': '113.250.250.18', 'args': {}}
<class 'dict'>

获取二进制数据
在上面提到了response.content，这样获取的数据是二进制数据，同样的这个方法也可以用于下载图片以及视频资源；

（5）添加Header
首先说，为什么要加header（头部信息）呢？例如下面，我们试图访问知乎的登录页面（当然大家都你要是不登录知乎，就看不到里面的内容），我们试试不加header信息会报什么错。

import requests

url = 'https://www.zhihu.com/'
response = requests.get(url)
response.encoding = "utf-8"
print(response.text)
结果：
**提示发生内部服务器错误（也就说你连知乎登录页面的html都下载不下来）。**
如下：
<html>
<head><title>400 Bad Request</title></head>
<body bgcolor="white">
<center><h1>400 Bad Request</h1></center>
<hr><center>openresty</center>
</body>
</html>
**如果想访问就必须得加headers信息。**
 ![获取网页header](https://img-blog.csdn.net/20180629225118700?watermark/2/text/aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3NpbmF0XzI2NTY2MTM3/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70)
import requests
url = 'https://www.zhihu.com/'
headers = {
    'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36'
}
response = requests.get(url,headers=headers)
print(response.text)

(6)基本post请求：
　通过post把数据提交到url地址，等同于一字典的形式提交参数形式限定的数据。
　

import requests

url = 'http://httpbin.org/post'  #get区别url = 'http://httpbin.org/get'
data = {
    'name':'jack',
    'age':'23'
    }
response = requests.post(url,data=data)#区别是params：requests.get(url,params=data)
print(response.text)

输出：
{
  "args": {},
  "data": "",
  "files": {},
  "form": {
    "age": "23",
    "name": "jack"
  },
  "headers": {
    "Accept": "*/*",
    "Accept-Encoding": "gzip, deflate",
    "Connection": "close",
    "Content-Length": "16",
    "Content-Type": "application/x-www-form-urlencoded",
    "Host": "httpbin.org",
    "User-Agent": "python-requests/2.13.0"
  },
  "json": null,
  "origin": "118.144.137.95",
  "url": "http://httpbin.org/post"
}

get和post获取数据的基本知识//python爬虫之requests的基本使用

猜你喜欢