[Python3.x]网络爬虫（一）：利用urllib通过指定的URL抓取网页内容 - 代码天地

[Python3.x]网络爬虫（一）：利用urllib通过指定的URL抓取网页内容

其他 2018-06-04 17:52:21 阅读次数: 1

1.爬百度首页,
方法1:

#!/usr/bin/python
# -*- coding: UTF-8 -*-
import urllib.request
response = urllib.request.urlopen('http://www.lovejing.com/')
html = response.read();
print(html);

1
2
3
4
5
6

方法2:

#!/usr/bin/python
# -*- coding: UTF-8 -*-
import urllib.request
req = urllib.request.Request('http://www.lovejing.com/')
response = urllib.request.urlopen(req)
html = response.read();
print(html);

1
2
3
4
5
6
7

2.发送data表单数据(POST请求)

import urllib.parse
import urllib.request

url = 'http://www.lovejing.com/cgi-bin/register.cgi'    

values = {'name' : 'WHY',    
          'location' : 'SDU',    
          'language' : 'Python' }    

data = urllib.parse.urlencode(values).encode(encoding='UTF8') # 编码工作  
req = urllib.request.Request(url, data)  # 发送请求同时传data表单
response = urllib.request.urlopen(req)  #接受反馈的信息
the_page = response.read()  #读取反馈的内容
print(the_page)

1
2
3
4
5
6
7
8
9
10
11
12
13
14

3.GET请求

import urllib.parse
import urllib.request

data = {}  

data['name'] = 'WHY'    
data['location'] = 'SDU'    
data['language'] = 'Python'  

url_values = urllib.parse.urlencode(data)
print(url_values)

url =  'http://www.lovejing.com/example.cgi'    
full_url = url + '?' + url_values

data = urllib.request.urlopen(full_url)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

4.设置Headers到http请求

import urllib.parse
import urllib.request

url = 'http://www.lovejing.com/cgi-bin/register.cgi'    

user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)' 
values = {'name' : 'WHY',    
          'location' : 'SDU',    
          'language' : 'Python' }    
headers = { 'User-Agent' : user_agent }  
data = urllib.parse.urlencode(values).encode(encoding='UTF8') # 编码工作  
req = urllib.request.Request(url, data,headers)  # 发送请求同时传data表单
response = urllib.request.urlopen(req)  #接受反馈的信息
the_page = response.read()  #读取反馈的内容
print(the_page)

猜你喜欢

转载自blog.csdn.net/weixin_38423249/article/details/80561521

[Python3.x]网络爬虫（一）：利用urllib通过指定的URL抓取网页内容

【转载】Python3网络爬虫(一)：利用urllib进行简单的网页抓取

Python3网络爬虫(一)：利用urllib进行简单的网页抓取

Python3.x urllib

Python3.X网络爬虫学习（一）

Python 3: 第一个网络爬虫：下载网页—指定内容

Python3.x爬虫下载网页图片的实例讲解

python3.x 中urllib的使用

Python3.X网络爬虫学习（六）

Python3.X网络爬虫学习（三）

Python3.X网络爬虫学习（二）

Python3.X网络爬虫学习（五）

Python3.X网络爬虫学习（四）

python网络爬虫（一，抓取网页的含义和URL基本构成）

[Python]网络爬虫（一）：抓取网页的含义和URL基本构成

python3.x爬虫 urllib和requests实现模拟登陆的具体步骤详解

python3 urllib爬虫抓取记录

Python3学习(34)--简单网页内容抓取（爬虫入门一）

Java网络爬虫-2 抓取指定URL网页数据以及解析

python爬虫爬取异步加载网页信息（python抓取网页中无法通过网页标签属性抓取的内容）

python3.x的urllib.request哪去了？

Python3.X下的爬虫实现

python3.x之爬虫学习

【网络爬虫】：Python：url基础：urllib

python2.X和python3.X中的urllib、urllib2，以及Request

python抓取网络图片保存到本地，通过url抓取文章的标题，通过链接地址，抓取内容数据

Python3.x爬虫教程：爬网页、爬图片、自动登录

Python3.X网络爬虫学习（七）-图片类爬虫项目实战

记录一下 python2.x 和 python3.x 中urllib库的变化情况

Python3 爬虫（一）-- 简单网页抓取

今日推荐

中国码农的“35岁魔咒”

蘭雅 CorelDRAW 插件 2024.5.1 国际劳动节版，免费下载

Arc Browser for Windows 1.0 正式 GA

90后程序员开发视频搬运软件、不到一年获利超 700 万，结局很刑！

《美国对全球网络空间安全与发展的威胁和破坏》报告发布

周排行

[编程题]学英语

[codeforces 1288A] Deadline 约数+模

Python的web开发

Docker在Centos 7上的部署

python编码

解决Ubuntu16.04 fatal error: json/json.h: No such file or directory

mysql并发插入

rest接口如何适应jsonp的方案

linux 终端上网设置

高数——等号两边同时求导、积分的解释

每日归档

更多

2024-05-04(7)

2024-05-03(19)

2024-05-02(0)

2024-05-01(4)

2024-04-30(1)

2024-04-29(40)

2024-04-28(0)

2024-04-27(56)

2024-04-26(39)

2024-04-25(22)