Crawled the data of 100,000 comments of Eason Chan's new song "We" and found that: it turns out that some people are only suitable for meeting

Recently, there was a "nostalgic" movie that became popular before it was broadcast, and that was Rene Liu's debut film - "Later Us". Youth, love, and dreams have always been the core elements of the theme of "nostalgia". Although the movie has not yet been released, the theme song "We" released first has made many people cry. In the MV, the singing is clear and shallow, telling the regrets about love in those years.

"My biggest regret is your regret, which is related to me", let's feel it together.

https://y.qq.com/n/yqq/song/002Ce1kE4crzRK.html

This song is the theme song in "Later Us". NetEase Cloud Music swept millions of views on the day it was launched, and now the comments on NetEase Cloud alone have exceeded 100,000.

NetEase Cloud Music has always been the "altar" I have longed for. The moment I listen to the music and see the heart-wrenching comments, the mountains and rivers flow. So practice Python today to grab the popular comments of a song. And make a chart and word cloud to display, and see what the most emotional comment content is relative to this song.

1. Grasp the data

To make a word cloud chart, you must first have data. So a little bit of crawling skills are needed.

The basic idea is: packet capture analysis, encrypted information processing, and capture of popular comment information

1. Packet capture analysis

We first open the web version of NetEase Cloud Music with a browser, enter the song page of Eason Chan's "We", and you can see the comments below. Then F12 to enter the developer console (inspect element).

The next thing to do is to find the url corresponding to the song comment, and analyze and verify whether the data is consistent with the actual data on the webpage. The steps are as follows:

Enter image description

Easily find the link where the comment is located by the song id

Enter image description

Check the information of the readers and find that the browser uses the POST method to make the request

Enter image description

The specific fields are as shown in the figure above, and you will find that two data need to be filled in the form, named params and encSecKey. It is followed by a large string of characters. After changing a few songs, you will find that the params and encSecKey of each song are different. Therefore, these two data may be encrypted by a specific algorithm.

Enter image description

The data related to comments returned by the server is in json format, which contains very rich information (such as information about the commenter, comment date, number of likes, comment content, etc.), among which hotComments is the popular comment we are looking for. 15 in total

Then our thinking is very clear, we only need to analyze this api and simulate sending requests, and get json for parsing.

2. Encrypted information processing

Then after my test, I can directly take these two data on the browser. But in order to really solve this encryption process, what needs to be encrypted and decrypted is just storage. Regarding how to decrypt these two parameters, the powerful Zhihu actually already has the answer. Interested friends can go in and take a look.

How to crawl the number of comments of Easy Cloud Music? ( https://www.zhihu.com/question/36081767)

We only need to use our lazy method to complete the requirements here. Here I will use such a temporary method, and it can be reused for different songs. We can verify it later.

3. Grab popular comment information

The code block is as follows:

import requests
import json

url = 'http://music.163.com/weapi/v1/resource/comments/R_SO_4_551816010?csrf_token=568cec564ccadb5f1b29311ece2288f1'

headers = {
    'User-Agent':'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36',
    'Referer':'http://music.163.com/song?id=551816010',
    'Origin':'http://music.163.com',
    'Host':'music.163.com'
}
#加密数据,直接拿过来用
user_data = {
    'params': 'vRlMDmFsdQgApSPW3Fuh93jGTi/ZN2hZ2MhdqMB503TZaIWYWujKWM4hAJnKoPdV7vMXi5GZX6iOa1aljfQwxnKsNT+5/uJKuxosmdhdBQxvX/uwXSOVdT+0RFcnSPtv',
    'encSecKey': '46fddcef9ca665289ff5a8888aa2d3b0490e94ccffe48332eca2d2a775ee932624afea7e95f321d8565fd9101a8fbc5a9cadbe07daa61a27d18e4eb214ff83ad301255722b154f3c1dd1364570c60e3f003e15515de7c6ede0ca6ca255e8e39788c2f72877f64bc68d29fac51d33103c181cad6b0a297fe13cd55aa67333e3e5'
}

response = requests.post(url,headers=headers,data=user_data)

data = json.loads(response.text)
hotcomments = []
for hotcommment in data['hotComments']:
    item = {
        'nickname':hotcommment['user']['nickname'],
        'content':hotcommment['content'],
        'likedCount':hotcommment['likedCount']     
    }
    hotcomments.append(item)

#获取评论用户名,内容,以及对应的获赞数   
content_list = [content['content'] for content in hotcomments]
nickname = [content['nickname'] for content in hotcomments]
liked_count = [content['likedCount'] for content in hotcomments]

2. Data visualization

After obtaining the relevant comment data, we will make it into a chart and a word cloud, which will make it more intuitive for people.

Enter image description

Next, you need to install the relevant installation packages on your own computer: pyecharts (chart package), matplotlib (drawing function package), WordCloud (word cloud package)

Among them, pyecharts is a class library for generating Echarts charts. Echarts is a data visualization JS library open sourced by Baidu. It is mainly used for data visualization. At the same time, pyecharts is compatible with Python2 and Python3. Installation is very simple, just:

pip install pyecharts

The next step is the implementation of the code, using the previously obtained comment user name and the corresponding number of likes to make it into a chart:

from pyecharts import Bar
bar = Bar("热评中点赞数示例图")
bar.add( "点赞数",nickname, liked_count, is_stack=True,mark_line=["min", "max"],mark_point=["average"])
bar.render()

Enter image description

It can be seen from this that the comments with the highest number of likes (95056) are:

@ Uncle 鱼Uncle: Later, I left him and left him forever. The ten-year relationship was only a few words. Later, I married a very ordinary person, without his romance, but with a different warmth.

Most of the likes are between 20,000-30,000, and the minimum is 7,000+ (basically consistent with the data in the comments on the webpage).

Finally, we will display all the popular comments in a word cloud graph. The code block is as follows:

from wordcloud import WordCloud
import matplotlib.pyplot as plt
content_text = " ".join(content_list)
wordcloud = WordCloud(font_path=r"C:\simhei.ttf",max_words=200).generate(content_text)
plt.figure()
plt.imshow(wordcloud,interpolation='bilinear')
plt.axis('off')
plt.show()

Result graph:

Enter image description

As can be seen from the picture, many people sighed that it was only you and me, and there was no us.

Note: All data belongs to the data crawled at that time.

3. Postscript

I once remembered that Guo Jingming wrote in the book, "We are too young to know that the time in the future is so long,

Long enough to make me forget you, enough to make me like someone again, just like I liked you back then. "

In our life, we always meet too many later. From understanding love to understanding love, from possessing to cherishing.

Fortunately, at the end, no matter how many years have passed. Later on, we all learned how to love each other.

Enter image description

As Eason Chan sang in the song, "If you have been obsessed, let go of it." Some people are worth it just to meet them.

We really didn't have it later.

Just let us go slowly and don't look back.

Don't talk about debts, thank you for meeting me.

It's just that the next time we meet love, we have to learn to cherish it more.

This is what love is, and why we love.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325070271&siteId=291194637