Summary of word cloud chart technology----big data visualization

# -*- coding: utf-8 -*-

This line of code specifies that the encoding format of the Python source file is UTF-8. This ensures that no encoding problems will occur when reading and processing files containing Chinese characters.

import jieba
import matplotlib.pyplot as plt
import matplotlib.colors as colors
from wordcloud import WordCloud, STOPWORDS
from PIL import Image
import numpy as np

This code imports the required python libraries. jieba is used for Chinese word segmentation library, matplotlib.pyplot, matplotlib.colors are used to draw graphics and color mapping, and WordCloud is used to generate word cloud diagrams. STOPWORDS is a collection used to filter meaningless words, PIL is a Python Imaging Library used to open and process images, and numpy is used to process image data.

text = open("text.txt", encoding='utf-8').read()
text = text.replace('\n', "").replace("\u3000", "")
text_cut = jieba.lcut(text)
text_cut = ' '.join(text_cut)

These codes are used to read the file "text.txt" containing the text of the article and store it in the variable text. At the same time, the program uses the replace() function to delete newline characters and special space characters in the article. Then, the program uses the jieba library to segment the article and save the segmentation results in the list text_cut. Finally, the program converts the text_cut list into a string for subsequent processing.

stopwords = set()
content = [line.strip() for line in open('hit_stopwords.txt', 'r').readlines()]
stopwords.update(content)

These codes are used to read the file "hit_stopwords.txt" containing nonsense words and save it in the stopwords collection. First, the program creates an empty collection stopwords. The program then uses a for loop to read each line in the file and uses the strip() function to remove spaces at the end of the line. Finally, the program uses the update() function to add all nonsense words to the stopwords collection.

background = Image.open("dnn.jpg").convert('RGB')
graph = np.array(background)
colormaps = colors.ListedColormap(['#FF0000', '#FF7F50', '#FFE4C4'])
wordcloud = WordCloud(scale=4,
                      font_path="C:/Windows/Fonts/simhei.ttf",
                      background_color="white",
                      mask=graph,
                      colormap=colormaps,
                      relative_scaling=0.1,
                      stopwords=stopwords).generate(text_cut)

These codes are used to generate word cloud plots. First, the program uses the PIL library to open the image named "dnn.jpg" and convert it to RGB format. Then, the program uses the numpy library to convert the picture into an array form and saves it in the variable graph. This image will serve as the background for the word cloud image. Next, the program creates a word cloud object named wordcloud and sets some parameters, including:
scale: the scaling ratio of the word cloud image. font_path: The path to the font file used to display Chinese characters. background_color: word cloud image

Guess you like

Origin blog.csdn.net/m0_56898461/article/details/130174057