[Python3.x]网络爬虫(一):利用urllib通过指定的URL抓取网页内容

1.爬百度首页, 
方法1:

#!/usr/bin/python
# -*- coding: UTF-8 -*-
import urllib.request
response = urllib.request.urlopen('http://www.lovejing.com/')
html = response.read();
print(html);
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6

方法2:

#!/usr/bin/python
# -*- coding: UTF-8 -*-
import urllib.request
req = urllib.request.Request('http://www.lovejing.com/')
response = urllib.request.urlopen(req)
html = response.read();
print(html);
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7

2.发送data表单数据(POST请求)

import urllib.parse
import urllib.request

url = 'http://www.lovejing.com/cgi-bin/register.cgi'    

values = {'name' : 'WHY',    
          'location' : 'SDU',    
          'language' : 'Python' }    

data = urllib.parse.urlencode(values).encode(encoding='UTF8') # 编码工作  
req = urllib.request.Request(url, data)  # 发送请求同时传data表单
response = urllib.request.urlopen(req)  #接受反馈的信息
the_page = response.read()  #读取反馈的内容
print(the_page)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14

3.GET请求

import urllib.parse
import urllib.request

data = {}  

data['name'] = 'WHY'    
data['location'] = 'SDU'    
data['language'] = 'Python'  

url_values = urllib.parse.urlencode(data)
print(url_values)

url =  'http://www.lovejing.com/example.cgi'    
full_url = url + '?' + url_values

data = urllib.request.urlopen(full_url)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16

4.设置Headers到http请求

import urllib.parse
import urllib.request

url = 'http://www.lovejing.com/cgi-bin/register.cgi'    

user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)' 
values = {'name' : 'WHY',    
          'location' : 'SDU',    
          'language' : 'Python' }    
headers = { 'User-Agent' : user_agent }  
data = urllib.parse.urlencode(values).encode(encoding='UTF8') # 编码工作  
req = urllib.request.Request(url, data,headers)  # 发送请求同时传data表单
response = urllib.request.urlopen(req)  #接受反馈的信息
the_page = response.read()  #读取反馈的内容
print(the_page)

猜你喜欢

转载自blog.csdn.net/weixin_38423249/article/details/80561521