B crawling station barrage
Some time ago crawling barrage b station now record it experiences
Preparatory
HTML parsing
Find json response file barrage is located, found that there are more than 1,000 real-time barrage
analysis parameters:
only the primary key to identify the video of oid
Reform found oid crawl down (I was thinking about all video grab up a master, so we should pay attention take all oid)
see each video url is fixed format: https: //www.bilibili.com/video/av+vid
grab all vid from the main interface, then all of the crawling xpath oid
this point, all oid crawl is completed
The following entry code fetch barrage, analysis data
First with a small test, grab individual video barrage test:
url="https://api.bilibili.com/x/v1/dm/list.so?oid=144896116"
headers={
"User-Agent" : "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:72.0) Gecko/20100101 Firefox/72.0"
}
# query_list={"oid":"144896116"}
response=requests.get(url,headers=headers)
html_str=response.content
html=etree.HTML(html_str)
d_list=html.xpath('//d')
content_list=[]
a=d_list[0].xpath('//text()')
items={}
items["danmu"]=a
content_list.append(items)
with open("blbl.txt","w",encoding="utf-8") as f:
for content in content_list:
f.write(json.dumps(content,ensure_ascii=False))
f.write("\n")
Sure enough, grab success
Then grab all of the barrage, resolution, stored in the database
import pymysql
import json
import io
import sys
con = pymysql.connect(
host='127.0.0.1',
port=3306,
user='root',
password='root',
db='test',
charset='utf8mb4'
)
cur = con.cursor()
cur.execute("insert into danmu(dm) value('测试')")
con.commit()
sys.stdout = io.TextIOWrapper(sys.stdout.buffer, encoding='gb18030')
with open("danmu.json", "r", encoding="utf-8") as f:
danmu_str = json.load(f)
# print(danmu_str["dm"][0]["danmu"])
j = 0
for j in range(0, 1002):
for item in danmu_str["dm"][j]["danmu"]:
# sql_str="insert into danmu(dm) value( \""+item+"\")"
for stuff in dirty_stuff:
item = item.replace(stuff, "")
print(item)
length = item.__len__()
for l in range(1, length - 1):
if (item_list[0]==item_list[l]):
item=item_list[0]
sql_str = "insert into danmu(dm) value( \"" + item + "\")"
cur.execute(sql_str)
con.commit()
Parsing data, barrage Rank:
select count(*) as count,dm as danmu
from danmu
group by dm
order by count desc;
Top result:
I was arrested and 1 million barrage of King Hanqing, ranking the advantages of unexpected results, what ah 0, f, ah, ah ... the