[Python] Control the browser to automatically download the lyrics comments and make a beautiful word cloud map

Use selenium to automatically download lyrics comments and make a good-looking word cloud map

1. Two beeps

A song is popular, and there are many people participating in the comments. Then we sometimes want to read the comments, but we can only read the popular comments. We don’t know what most people say~

Then this time, we will automatically download and save the lyrics to the computer, and make a word cloud map for it to analyze...

insert image description here

2. Preparations

1. Required modules

Modules and packages used this time:

re  # 正则表达式 内置模块
selenium  # 实现浏览器自动操作的
jieba  # 中文分词库
wordcloud  # 词云图库
imageio  # 图像模块
time  # 内置模块

The installation method of the modules that need to be installed:
Take selenium as an example. If the pip install selenium
download speed is slow, use the mirror source to download.
The specific steps are not clear, you can see the article on the top of my homepage, and I wrote it in detail once.
insert image description here

2. Driver installation

So to achieve automatic browser operation, we have to install a browser driver.
I will not post the URL. You can find it by searching the Google browser driver on the Internet. If you can't find it, scan it on the left side. If you don't understand the article, you can scan the code on the left side.

It is recommended to use Google Chrome, take Google Chrome as an example, first look at the version of our browser.
Click the three dots in the upper right corner of the browser, and then click Settings.
insert image description hereThen click About Chrome, the string of numbers on the right is the version number.
insert image description here
Then find the version with the same version number as you and download it. If you don't have the same version, you can download the closest version.
insert image description hereThen put the driver together with your code. If you put it together with the code, the disadvantage is that every time you want to use it, you have to download it if you don't save it.

Another way is to put it directly in your python directory. The advantage of this is that it can be used many times once it is done. The disadvantage is that every time the version is updated, you still have to download the new one.

I download new ones every time anyway, and I don't use them very often.

insert image description here

3. Download the lyrics

First import the modules you want to use

from selenium import webdriver
import re 
import time  

Do not name the Python file name or package name as selenium, which will cause it to fail to import.

Webdriver can be considered as the driver of the browser. To drive the browser, you must use the webdriver, which supports a variety of browsers.

  1. Create a browser object
driver = webdriver.Chrome()
  1. request page
driver.get('https://music.163.com/#/song?id=569213220')

driver.implicitly_wait(10)  # 隐式等待  浏览器渲染页面  智能化等待
driver.maximize_window()  # 最大化浏览器

Web page nesting, enter the nested web page.

driver.switch_to.frame(0)

Dropdown page js is a language that can run directly in the browser

# document.documentElement.scrollTop  指定页面的高度
# document.documentElement.scrollHeight  获取页面的高度
# document.documentElement.scrollTop  指定页面的高度
# document.documentElement.scrollHeight  获取页面的高度
js = 'document.documentElement.scrollTop = document.documentElement.scrollHeight'
driver.execute_script(js)

Get comment data/save/click next page

for click in range(10):
    divs = driver.find_elements_by_css_selector('.itm')
    for div in divs:
        cnt = div.find_element_by_css_selector('.cnt.f-brk').text
        cnt = cnt.replace('\n', ' ') # 替换换行符
        cnt = re.findall(':(.*)', cnt)[0]
        
        with open('contend.txt', mode='a', encoding='utf-8') as f:
            f.write(cnt + '\n')

    # 找到下一页标签点击
    driver.find_element_by_css_selector('.znxt').click()
    time.sleep(1)


input('程序阻塞.')
  1. Exit the browser
driver.quit()

to see the effect
insert image description here

Four, word cloud map

Draw word cloud map/size settings

import jieba  # 中文分词库
import wordcloud  # 词云图库
import imageio  # 图像模块


file = open('contend.txt', mode='r', encoding='utf-8')
txt = file.read()
# print(txt)
txt_list = jieba.lcut(txt)
print('分词结果',txt_list)

string = ' '.join(txt_list)
print('合并分词:', string)

"""制作词云图"""
# 读取图像
img = imageio.imread('音乐.png')


# 设置词云图
wc = wordcloud.WordCloud(
    width=1000, # 词云图的宽
    height=700, # 图片的高
    background_color= 'black', # 词云图背景颜色
    font_path='msyh.ttc',  # 词云字体, 微软雅黑, 系统自带
    scale=10, # 字体大小
    # mask=img,
    stopwords=set([line.strip() for line in open('cn_stopwords.txt', mode='r',
                                                 encoding='utf-8').readlines()])
)

print('正在绘制词云图')
wc.generate(string)
wc.to_file('output2.png')
print('词云图制作成功...')

Effect display
insert image description here
brothers, go try it, remember to like three consecutive ha~

Guess you like

Origin blog.csdn.net/fei347795790/article/details/122394243