学习爬虫:
1、安装Python(anaconda)
2、安装库
pip install requests
3、简单访问网页
import requests #导入requests库 r = requests.get('http://www.lining0806.com') #像目标url地址发送get请求,返回一个response对象 print(r.text) #r.text是http response的网页HTML
4、获取文章标题
1 # -*- coding: utf-8 -*- 2 """ 3 Spyder Editor 4 5 This is a temporary script file. 6 """ 7 8 import requests 9 from bs4 import BeautifulSoup 10 11 url = 'http://www.lining0806.com' 12 headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36'} 13 r = requests.get(url, headers=headers) 14 all_title = BeautifulSoup(r.text, 'lxml').find('div', class_='content').find_all('a',attrs={"target": "_blank"}) 15 Alltitle = [] 16 for title in all_title: 17 title_temp = title.get('title') 18 print(title_temp) 19 if (title_temp ==None): 20 continue 21 else: 22 Alltitle.append(title_temp) 23 print (Alltitle)
目前只有找到使用循环获取a标签下的title内容。以后有更好的方法时再更新