python beautifulsoup 爬虫实战--抓取acm队员atcoder比赛数据

首先需要安装bs4包 命令如下:pip install beautifulsoup4
atcoder提供了单个用户的历史比赛信息网页:https://atcoder.jp/users/a2018040538/history
对上面网页进行分析,抓取历史参赛数据的代码如下:
 
 1 from bs4 import BeautifulSoup
 2 import requests
 3 # 
 4 def getACUserData(acID):
 5     url = "https://atcoder.jp/users/"+acID+"/history"
 6     html = requests.get(url)
 7     soup = BeautifulSoup(html.text, features="lxml")
 8     t = soup.select('#history')[0]
 9 
10     # 结构: [dict1, dict2, ...]
11     # dict结构{'date': date, 'contest': contest, 'rank': rank, 'newRating': newRanking, 'diff':diff}
12     data_list = []  
13 
14     for idx, tr in enumerate(t.select('tr')):
15         if idx != 0:
16             tds = tr.select('td')
17             date = tds[0].select('time')[0].text
18             contest = tds[1].select('a')[0].text
19             rank = tds[2].select('a')[0].text
20             newRating = tds[4].select('span')[0].text
21             diff = tds[5].contents[0]
22             # print(date,contest,rank,newRating,diff)
23             data_list.append({
24                 'date': date,
25                 'contest': contest, 
26                 'rank': rank, 
27                 'newRanking': newRating, 
28                 'diff':diff
29             })
30 
31     return data_list
32 
33 if __name__ == "__main__":
34     acID = "a2018040538"
35     dataList = getACUserData(acID)
36     print(dataList)

猜你喜欢

转载自www.cnblogs.com/liuyong0076/p/12236692.html
今日推荐