基于requests的爬虫基础

1、首先安装requests模块   pip install requests

2、给出url和 headers的参数:

3、浏览器中按F12  进入network ,刷新页面,然后点击

基础完整代码如下:

import requests  #导入模块
url = 'https://www.baidu.com/'  #url地址

headers={"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36"}  # 请求头,用来模拟浏览器
#response = requests.get(url=url,headers=headers).text   #返回纯文本,可能乱码


response = requests.get(url=url,headers=headers).content.decode("utf-8","ignore") #返回指定编码格式

## 加入参数ignore可以忽略部分编码


with open("baidu.html","w",encoding="utf-8") as f:   #写入文件
    f.write(response)

url带参数完整代码如下:

第一种方式:

import requests
url = 'https://www.baidu.com/s?wd=哈士奇'
headers={"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36"}
response = requests.get(url=url,headers=headers).content.decode("utf-8")
with open("hashiqi.html","w",encoding="utf-8") as f:
    f.write(response)

第二种方式:

import requests
url = 'https://www.baidu.com/s?'
params={"wd":"边"}

headers={"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36"}
response = requests.get(url=url,params=params,headers=headers).content.decode("utf-8")

with open("bian1.html","w",encoding="utf-8") as f:
    f.write(response)

猜你喜欢

转载自blog.csdn.net/qq_40576301/article/details/99769801