Use Python3 + requests + re crawling station B Barrage

Watch it, Mina Sang!

After the B station dug the hole for a month to no avail, or intend to make any point B station, so I'm eyeing up their barrage system! Yes that is the veritable B bomb!

Yes We want to crawl station B Barrage video today!

Crawling station B Barrage said simple implementation difficult, you just have to find barrage of video interfaces api

But also to master a series of expressions and programming syntax

but

I do not panic, as did a number of 996 patients into the ICU (mistakenly)

Let's start coding now!

 

Language: Python3.7

Our first stop barrage of B an understanding of the system before you start writing

We first use of technical means that station B alone system separate barrage

Then open the review elements api interface to query

 

Here I would like to send a barrage

You can see the network to capture the way the request of a post

After opening the gradual Query

Successfully found the interface!

In fact on GitHUb bilibili the api interfaces it has been published

But the authors believe it is more reliable to look a little manual

We can see the barrage is to be crawled returned interface

We open Python, import requests library

re library!

Note crawling barrage to use regular expressions to I will explain later

We first imported into several modules

import requests as reqimport re

Then start writing cycle, after all, can not climb on it only once

import requests as reqimport rewhile True:  video_number = input("av:")

Conditions trigger video_number value here is very important so it is necessary to add a conditional

import requests as reqimport rewhile True:  video_number = input("av:")  if video_number == int:    print("int type");  else:

The next character is equivalent to stitching, I am not here explained ...

import requests as reqimport rewhile True:  video_number = input("av:")  if video_number == int:    print("int type");  else:    api_key = "https://api.bilibili.com/x/v1/dm/list.so?oid="+video_number    repon = req.get(api_key) #获取api    repon = encoding = "utf-8" #将编码转换为utf-8    xml = res.text 

Basically respond successfully

Then we started writing regular expressions

The xml data look under

From <d we start at the beginning of the intercepted> Then we want to obtain is the intermediate value

Back to keep </ d> specific code and variables xml

re.findall("<d.*?>(.*?)</d>",xml) Next, we then assign its variables, print look
import requests as reqimport rewhile True:    avnumber = input("av:")    if avnumber == int:        print("OK")    else:        urlapi = "https://api.bilibili.com/x/v1/dm/list.so?oid="+avnumber        res = req.get(urlapi)        res.encoding = "utf-8"        xmlString = res.text        danmukus = re.findall("<d.*?>(.*?)</d>",xmlString)

 

Ah ~ more comfortable

Bye

Published 16 original articles · won praise 9 · views 6572

Guess you like

Origin blog.csdn.net/weixin_42608762/article/details/100850808