What did you get from the barrage analysis of "[Student He] I took a group photo of 6 million people..."?

This blog has long intended to write, but because I was lazy till now, to explain to Bowen selection of background: August 2, B station Digital Up Main good teacher I told the students how to update a video "What [the students] I took a group photo of 6 million people... "(It has been more than a month since his last video update) to commemorate the third anniversary of the channel’s establishment

Although classmate He guessed that the quality of this video will not be bad in the teaser one day in advance, after watching it, there are only two words in my heart ! , This video is really amazing to me. I admire his video production level and admire his attention to fans; I saw a barrage in the video, and I feel that the evaluation is very relevant: wait more than a month to see this Video, worth it!

This video is still very recommended. As of August 8th, it has been broadcasted 8.8 million times on station B. I will put the link below. Those who are interested or want to use the second brush can do it.

And this article will collect the barrage of this video and do a simple analysis; in the next part, the article will be divided into three parts:

  • 1. Data collection part; introduce a method of collecting B video barrage;

  • 2. Data visualization and analysis; make a visual chart based on the time distribution of comments, and make simple analysis from several perspectives such as the release time of the barrage, the proportion of the barrage in each stage of the video, and the time trend of the number of barrage;

  • 3. Barrage word cloud visualization: convert the collected text into a word cloud diagram;

Barrage collection at station B

1. Here is an introduction to the method of collecting video barrage at station B. This article uses a Python script as a grabbing tool. First, open the captured video webpage. Here, take this video of classmate He as an example, and find the barrage list on the right.

1.png

2. Press F12 to open the developer tools, click on the barrage list, view the history barrage, select the date , and then find the history?.. link in the developer mode (shown by the arrow on the right in the figure, refresh if not found Page, just follow the above steps again);

2.png

The link in 3, 2 is the final link we need. After careful observation, we will find that this link is composed of two key parameters, an oid and a date . oid means the video ID is easier to find, and date means the date, which can be constructed by datetime ;

3.png

4.png

4. After constructing the link in 3, use the conventional crawling method (requests + Beautifulsoup),

5.png

The main code parts are as follows:

def get_duration_time(start_date1,end_date,video_id):
    # 日期格式转换
    start_date = datetime.datetime.fromisoformat(start_date1)
    end_date = datetime.datetime.fromisoformat(end_date)
    dateltime = datetime.timedelta(days=1)
    while start_date <= end_date:

        startdate_format =  start_date.strftime("%Y-%m-%d")
        download_date(startdate_format,video_id)
        start_date = start_date +  dateltime



def download_date(timedate,video_id):
    # 传入日期、视频id 进行数据爬取
    shipin_url = 'https://api.bilibili.com/x/v2/dm/history?type=1&oid={0}&date={1}'.format(video_id,timedate)
    print("正在抓取弹幕网页", shipin_url)
    response = requests.get(url = shipin_url,headers = headers)
    response.encoding = 'utf-8'
    soup = BeautifulSoup(response.text,'lxml')
    for i in soup.find_all('d'):
        locate = re.findall(r'p="(.*?)">(.*?)</d>',str(i))
        file.write(str(locate[0][0]))
        file.write(',')
        file.write(str(str(locate[0][-1])))
        file.write("\n")
    time.sleep(2) # 增加时间间隔,防止爬取太频繁;

Visual analysis

The visualization part mainly uses Pyecharts. A total of 7000 barrage items have been collected, and the time span is from 8.2 to 8.8.

1. The following picture is about the change trend of the total number of barrage from 8.2 to 8.8. There is no effective information from the chart. One thing is that the barrage trend shows a short period of rapid rise and a period of flatness;

Snipaste_2020-08-08_23-22-26.png

2. Next, take every 30 minutes as a time period from 8.2 to 8.8, summarize the distribution and change of the number of barrage in each time period of each day, and finally draw a dynamic trend chart along the time line;

Recording_2020_08_08_23_32_16_808.gif

From the dynamic figure we can see the video just released not long barrage is more concentrated, and the largest number, over time, become more dispersed comment, increasingly reducing the number , this feature may apply to the majority of the head of the main video playback features Up , Play volume from more to less, popularity from high to low

3. Finally, because the video "[Student He] I took a group photo of 6 million people..." the entire time length is 7:56, so I want to observe that in the collected data, the number of barrage released during that time period is the most. In that time period, the release was the least, and then combined with the analysis of the video node content itself, and finally got the following picture:

Snipaste_2020-08-08_23-42-32.png

The most barrage is in the middle of 4 minutes-4 minutes and 30 seconds. I cut one of the pictures. The video content part is that I and my classmates successfully engraved the id of nearly 6 million fans on the A4 cardboard and pasted it on the wall. To lay the foundation for the subsequent integration of a lens; guess from my own perspective, because this workload is too much, why the students are moved by their intentions;

Snipaste_2020-08-08_23-40-14.png

It barrage of at least 2 minutes to 2 and a half, this part is thinking about how to find a capable plastered walls 300 A4 paper? , The content is mixed with a part of humor, and most of the fans are probably thinking, and they forgot to send the barrage (a little guess of their own).

Snipaste_2020-08-08_23-43-15.png

Word Cloud Diagram Display

Finally, make the 7000 barrage into a word cloud image, and do a visual preview. There is no Python related package here, but WordArt is used.

Word Art.png

Tears, touches, and hard work. It ’s the fans’ feelings after watching this video, which is enough to reflect the heart and sincerity of classmate He. Finding the ID means that the fans have indeed found their own ID in the photo of 6 million fans, and you can feel the shooting of this The difficulty of taking pictures,

Finally, the teacher told me what a good student ID for the word cloud background as the end of the article, the article about the complete source code, number [public] Mr. Z point in mind reply back where students can get.

Word Art (1).png

Well, the above is all the content of this article. Finally, thank you all for reading!

Guess you like

Origin blog.csdn.net/weixin_42512684/article/details/108181852