python 小程序

1、
#-*- coding:utf-8 -*- 
import urllib.request #导入模块
page =urllib.request.urlopen('http://tieba.baidu.com/p/1753935195')
htmlcode=page.read()

最开始程序如下

 #coding:utf-8
 import urllib
 page = urllib.urlopen('http://tieba.baidu.com/p/1753935195')#打开网页
 htmlcode = page.read()#读取页面源码

运行时报错：AttributeError: module 'urllib' has no attribute 'urlopen'

百度了下才知道Python3.X中应该用urllib.request。更改后就不会再出现这个错误了。

pageFile=open('pageCode.txt','wb')
pageFile.write(htmlcode)
pageFile.close()

此处又遇到问题，最开始程序如下

 pageFile = open('pageCode.txt','w')#以写的方式打开pageCode.txt
 pageFile.write(htmlcode)#写入
 pageFile.close()

报错：TypeError: write() argument must be str, not bytes

继续度娘发现，原来是文件打开方式有问题，把之前的打开语句修改为用二进制方式打开就没有问题，也就是‘wb+’

open()函数，传入标识符'w'或者'wb'表示写文本文件或写二进制文件

综上：

#-*- coding:utf-8 -*-
import urllib.request
page =urllib.request.urlopen('http://tieba.baidu.com/p/1753935195')
htmlcode=page.read()

pageFile=open('pageCode.txt','wb')
pageFile.write(htmlcode)
pageFile.close()

此处，已将网页内容存储到文件pageCode.txt中。

2、

#获取页面
def get_html(url):
    page=urllib.request.urlopen(url)
    html=page.read()
    html=html.decode('utf-8')#python3必须加上，不然会报错，错误如（1）所示
    return html

reg = r'src="(.+?\.jpg)" width'#正则表达式
reg_img = re.compile(reg)#编译一下，运行更快
imglist = reg_img.findall(get_html('http://tieba.baidu.com/p/1753935195'))
x = 0
for img in imglist:
    # print (img)
    urllib.urlretrieve(img,'%s.jpg'%x) #报错（2）
    urllib.request .urlretrieve(img,'%s.jpg'%x)
    x +=1

（1）TypeError: cannot use a string pattern on a bytes-like object

解决办法：

加上html=html.decode('utf-8') #python3这句代码

（2）AttributeError: module 'urllib' has no attribute 'urlretrieve'

解决办法：

python3的urllib没有urlretrieve的,它在request 中的,所以你首先要用

import urllib.request

urllib.request .urlretrieve(img,'%s.jpg'%x)

猜你喜欢