python爬虫入门，几个常用方法 - 代码天地

python爬虫入门，几个常用方法

其他 2020-02-20 11:29:46 阅读次数: 0

将互联网上的东西下载到本地

import urllib.request
#urlretrieve
urllib.request.urlretrieve("https://www.baidu.com","C:/Users/10167/Desktop/address.html")

清除缓存用

#urlcleanup，
urllib.request.urlcleanup()

爬取的网页的简介信息

#info，
data = urllib.request.urlopen("https://blog.csdn.net/qq_40666620/article/details/102834104")
print(data.info())

状态码，就可以找失效的连接什么的

#getcode：
print(data.getcode())

获取当前爬取的url地址

#geturl：
print(data.geturl())

timeout超时设置

for i in range(0,100):
    try:
        data = urllib.request.urlopen("https://blog.csdn.net/qq_40666620/article/details/102834104"
                                      ,timeout=0.1).read()
        print("success")
    except Exception as error:
        print(error)

自动模拟http请求

import re
#post,get
#get:
keyword = "python"
keyword = urllib.request.quote(keyword)
url="http://www.baidu.com/s?wd="+keyword
target = 'title":"(.*?)"'
#print(data)
for pn in range(0,10):
    #9*pn是因为现在百度一页是9条信息，pn已经不是页数了
    data = urllib.request.urlopen(url+"&pn="+str(9*pn)).read().decode("utf-8")
    result = re.compile(target).findall(data)
    for i in range(0,len(result)):
        print(result[i])

胜天半子_王二_王半仙

发布了157 篇原创文章 · 获赞 167 · 访问量 2万+

私信关注

猜你喜欢

转载自blog.csdn.net/qq_40666620/article/details/102846788

python爬虫入门，几个常用方法

Python爬虫入门，常用爬虫技巧盘点

Python中几个常用的类方法

python中的os模块几个常用的方法

Python3 常用的几个内置方法

学习Python爬虫，你不安装这几个常用的库？

python爬虫入门request 常用库介绍

风火编程--python爬虫几个xpath解析方法

Java多线程入门中几个常用的方法

python 爬虫封装自己的常用方法

python常用的爬虫模块及使用方法

【python爬虫】selenuim常用方法总结

python 爬虫（九）selenium常用方法总结

requests 常用的几个方法

StringUtils的几个常用方法

axios 常用的几个方法

requeste的常用的几个方法

Python--day38--守护进程和几个常用的方法

自己动手造“轮子”---python常用的几个方法

Python常用的几个模块

Python几个常用模块

Python 网络爬虫的几个库

python 爬虫初学的几个概念

Python爬虫基础：常用HTML标签和Javascript入门

Python入门-字符串常用方法

【Python基础入门】（13）List的常用方法

Python爬虫从入门到放弃 02 | Python爬虫中的常用语法与模块

页面优化的几个常用方法

几个常用的外链方法

jquery脚本的几个常用方法:

今日推荐

面壁智能发布 Eurux-8x22B 开源大模型 —— 堪称「理科状元」

开源日报 | 谷歌扶持鸿蒙上位；开源Rabbit R1；Docker加持的安卓手机；微软的焦虑和野心；海尔电器把开放平台关了

中国码农的“35岁魔咒”

蘭雅 CorelDRAW 插件 2024.5.1 国际劳动节版，免费下载

Arc Browser for Windows 1.0 正式 GA

90后程序员开发视频搬运软件、不到一年获利超 700 万，结局很刑！

周排行

OOP第二次作业

java web 乱码问题

android 禁止scrollview 因控件变化自动滚动到底的方法

mysql服务解压版的安装(5.7)

centos7 nginx+tomcat配置https 安装免费SSL Let’s Encrypt

使用Mosquitto遗嘱机制实现感知客户端上下线功能的方法

面向对象之------多态与多态性

开发Teams Tabs应用程序

C# 希尔排序

第2章 Jupyter Notebooks

每日归档

更多

2024-05-06(40)

2024-05-05(0)

2024-05-04(7)

2024-05-03(19)

2024-05-02(0)

2024-05-01(4)

2024-04-30(1)

2024-04-29(40)

2024-04-28(0)

2024-04-27(56)