Python3爬虫urllib库的使用 - 代码天地

Python3爬虫urllib库的使用

其他 2020-04-10 13:39:01 阅读次数: 0

访问页面

from urllib.request import Request,urlopen
url='http://www.xx.com'
req=Request(url)
resp=urlopen(req)

返回数据

html=resp.read().decode()

添加报头信息


from fake_useragent import UserAgent
header={
    'User-Agent': UserAgent().chrome
}
url='http://www.xx.com'
req=Request(url,headers=header)

https访问

#忽略证书
context=ssl._create_unverified_context()
response=urlopen(requset,context=context)

POST请求

from urllib.parse import urlencode
my_data={
    'usr':'123',
    'pwe':'123456'

}
f_data=urlencode(my_data).encode()
requset=Request(url,data=f_data,headers=headers)

Proxy代理

#http://httpbin.org/get
from urllib.request import build_opener,ProxyHandler
proxy=ProxyHandler({'http':'usr:pwd@ip:port'})
opener=build_opener(proxy)

Cookie

#Methon1
headers={
    'User-Agent':UserAgent().chrome,
    'Cookie':'xxxx'
}
request=Request(url,headers=headers)

#Methon2
from urllib.request import build_opener,HTTPCookieProcessor
from http.cookiejar import MozillaCookieJar
cookie_jar=MozillaCookieJar()
handle=HTTPCookieProcessor(cookie_jar)
opener=build_opener(handle)
response=opener.open(request)
#保存cookies，ignore_expires->是否过期，ignore_discard->是否丢弃
cookie_jar.save('cookie.txt',ignore_expires=True,ignore_discard=True)
#使用cookie
cookie_jar.load('cookie.txt',ignore_expires=True,ignore_discard=True)

URLError

from urllib.error import URLError
try:
    opener=build_opener()
    response=opener.open(request)
    info=response.read().decode()
except URLError as e:
    if e.args==():
        print(e.code)
    else:
        print(e.args[0].errno)

呆马蓝

发布了26 篇原创文章 · 获赞 0 · 访问量 588

私信关注

猜你喜欢

转载自blog.csdn.net/kkLeung/article/details/105423675

Python3爬虫urllib库的使用

python3爬虫(1)--urllib请求库使用

Python3爬虫urllib使用介绍

Python3 Urllib库的基本使用

Python3中urllib库的使用

urllib库的使用（二）-----python3

urllib库的使用（一）-----python3

【笔记】3、初学python3网络爬虫——urllib库的使用

Python3爬虫（一）：请求库之urllib

python3爬虫学习之urllib库

【Python3 爬虫】U02_urllib库

python3爬虫之Urllib库（一）

python3爬虫之Urllib库（二）

Python3爬虫从零开始：urllib库的使用（二）

Python3爬虫从零开始：urllib库的使用（一）

python3爬虫入门（urllib和requests简单使用）

python3使用urllib模块制作网络爬虫

Python3 Urllib库

Python3 urllib使用

python3 爬虫（一）--初识urllib

python3: 爬虫---- urllib, beautifulsoup

Python3爬虫实战（urllib模块）

python3爬虫-urllib+BeautifulSoup

python3 urllib爬虫抓取记录

Python3爬虫笔记 -- urllib

【Python爬虫】urllib库的使用

python爬虫urllib库使用

Python爬虫urllib库的使用

Python3 urllib.request库的基本使用

urllib库的使用（三）-----python3 异常处理

今日推荐

TIOBE 5 月榜单：Fortran “复活”进入 Top 10

GCC 14.1 发布

面壁智能发布 Eurux-8x22B 开源大模型 —— 堪称「理科状元」

开源日报 | 谷歌扶持鸿蒙上位；开源Rabbit R1；Docker加持的安卓手机；微软的焦虑和野心；海尔电器把开放平台关了

中国码农的“35岁魔咒”

蘭雅 CorelDRAW 插件 2024.5.1 国际劳动节版，免费下载

Arc Browser for Windows 1.0 正式 GA

90后程序员开发视频搬运软件、不到一年获利超 700 万，结局很刑！

周排行

Java自定义时间格式

同步整形电路

在开发中最最最常用的字符串的属性大集合

Linux 查看端口占用并杀掉

Java基础四：ArrayList

多线程之死锁就是这么简单

mysql 基础命令集

awk 命令详解

Centos6.3编译安装nginx+php步骤

OCR （Optical Character Recognition，光学字符识别）

每日归档

更多

2024-05-08(42)

2024-05-07(14)

2024-05-06(40)

2024-05-05(0)

2024-05-04(7)

2024-05-03(19)

2024-05-02(0)

2024-05-01(4)

2024-04-30(1)

2024-04-29(40)