Requests的用法:
txt=r
import requests
txt=requests.get('http://blog.sina.com.cn/s/blog_4701280b0102wrup.html')
txt1=txt.content.encode('utf-8')
print(txt1)
输出结果(str型):
'<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">\n<html xmlns="http://www.w3...........................'
Urllib用法:
from bs4 import BeautifulSoup
import urllib
con=urllib.request.urlopen('http://blog.sina.com.cn/s/blog_4701280b0102wrup.html').read()
soup=BeautifulSoup(con,'lxml')
print(soup.prettify)
输出结果(con是byte型的):
<bound method Tag.prettify of <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
<title>博文_韩寒_新浪博客</title>
<meta content="IE=EmulateIE8,chrome=1"............