学习爬虫(一)

学习爬虫:

1、安装Python(anaconda)

2、安装库

pip install requests

3、简单访问网页

import requests #导入requests库
r = requests.get('http://www.lining0806.com') #像目标url地址发送get请求,返回一个response对象
print(r.text) #r.text是http response的网页HTML

4、获取文章标题

 1 # -*- coding: utf-8 -*-
 2 """
 3 Spyder Editor
 4 
 5 This is a temporary script file.
 6 """
 7 
 8 import requests
 9 from bs4 import BeautifulSoup
10 
11 url = 'http://www.lining0806.com'
12 headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36'} 
13 r = requests.get(url, headers=headers)
14 all_title = BeautifulSoup(r.text, 'lxml').find('div', class_='content').find_all('a',attrs={"target": "_blank"})
15 Alltitle = []
16 for title in all_title:
17     title_temp = title.get('title')
18     print(title_temp)
19     if (title_temp ==None):
20         continue
21     else:
22         Alltitle.append(title_temp)
23 print (Alltitle)

目前只有找到使用循环获取a标签下的title内容。以后有更好的方法时再更新

猜你喜欢

转载自www.cnblogs.com/Crazy-sun/p/9189048.html