Crawling web content beginning with

import requests,re

 

if __name__ == '__main__':
r = requests.request('get', 'https://www.168seo.cn/python-2/3410.html', verify = False)
resp = r.headers
body = r.text
print(r.status_code,'------',resp['Content-Type'].split(';')[1].split('=')[1])

result = re.findall('<h4>([a-zA-Z0-9\s\u4e00-\u9fa5]*)</h4>',body,re.S)
for item in result:
print(item)

 

The results are:

UTF-8 ------ 200
Python most concise picture of the code to download
Python Django return value is a picture
vs code ssh remote debugging remote code
to your website to add django cache speed gevent django
how to MySQL database stored in Django array
Centos install RabbitMQ detailed process
to increase more action ADMIN many as Django
Python will be resolved to their absolute path relative url path

Guess you like

Origin www.cnblogs.com/xing-sir/p/11345371.html