使用BeautifulSoup读取网页时发生错误的处理方法

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/lingyunxianhe/article/details/82845988

刚开始学习BeautifulSoup在读取网页后解析网页内容时发生错误,先上一段运行代码:

#!/usr/bin/python
# -*- coding: UTF-8 -*-
from bs4 import BeautifulSoup
from urllib2 import urlopen
WebSite='http://www.weather.com.cn/weather/101010100.shtml'
soup = BeautifulSoup(WebSite,"html.parser")#"html.parser",,from_encoding="utf-8"
print soup.prettify()

我是想把给定网页的内容显示一下,但运行程序时出现如下错误:

/usr/lib/python2.7/dist-packages/bs4/__init__.py:282: UserWarning: "http://www.weather.com.cn/weather/101010100.shtml" looks like a URL. Beautiful Soup is not an HTTP client. You should probably use an HTTP client like requests to get the document behind the URL, and feed that document to Beautiful Soup.
  ' that document to Beautiful Soup.' % decoded_markup
http://www.weather.com.cn/weather/101010100.shtml

最后在stackoverflow上找到了答案,网址:https://stackoverflow.com/questions/24768858/beautifulsoup-responses-with-error

出现上述问题是因为程序中这条语句:soup = BeautifulSoup(WebSite,"html.parser")是有问题的,应该为:soup = BeautifulSoup(urlopen(WebSite),"html.parser")

正确的完整代码如下:

#!/usr/bin/python
# -*- coding: UTF-8 -*-
from bs4 import BeautifulSoup
from urllib2 import urlopen
WebSite='http://www.weather.com.cn/weather/101010100.shtml'
soup = BeautifulSoup(urlopen(WebSite),"html.parser")#"html.parser",,from_encoding="utf-8"
print soup.prettify()

猜你喜欢

转载自blog.csdn.net/lingyunxianhe/article/details/82845988