python 爬虫的一些坑

1.'gbk' codec can't encode character '\xXX' in position XX

    f=open(file,'w',encoding='utf-8')#要加encoding='utf-8'这个参数

 f.write(ss.text)#就不会报错

2.爬虫request不显示正文

     最简单的方法:不用pycharm试试用python原装IDE

3.爬虫乱码

网页编码可能不对,获取网页编码:

import requests

url='http://ldzl.people.com.cn/dfzlk/front/personProvince3106.htm'

print(requests.get(url).encoding)

  

猜你喜欢

转载自www.cnblogs.com/polipolu/p/12972578.html