Python makes a word cloud of information technology new course standard

1. Course standard pdf to txt

        There are many web applications on the Internet that can be converted, but for Q, I used АББYY FineReader to convert. I used some problems and manually corrected it. I can also use adobe Acrobat XI Pro to convert, but the effect is not as good as the former.

2. Self-built user dictionary mydict.txt

One word in the following line

Data 
Algorithm 
Network 
Information Processing 
Information Security 
Information Privacy 
Information Awareness 
Computational 
Thinking Digital Learning and Innovation 
Information Social Responsibility 
Digital Literacy 
Information Literacy 
Core Literacy 
Original Innovation Original 
Spirit 
Independent Controllable Technology Artificial Intelligence 
Innovation 
National 
Security 
Network Security 
Data Security 
Interdisciplinary Themes 
Internet of Things 
Internet 
Sharing 
Online Learning 
Online Society 
Online Platform 
Information Technology 
Authentic Learning 
Scientific Principles 
Digital Age 
Online Society 
Digital Devices 
Problem Solving 
Technology Ethics 
Experiential Process 
Scenarios of Control

3. Participate with stammer

'''
用结巴分词,统计词频,定义了一个常用词词典
'''
import jieba

file = 'newkebiao.txt'
file_userdict = 'mydict.txt'
jieba.load_userdict(file_userdict)
f = open(file, 'r', encoding='UTF-8').read()
words = jieba.cut(f, cut_all=False, HMM=True)
counts = {}
for word in words:
    if len(word) == 1:
        continue
    else:
        counts[word] = counts.get(word, 0) + 1

items = list(counts.items())
items.sort(key=lambda x: x[1], reverse=True)
for i in range(len(items)):
    word, count = items[i]
    print("{0:<12}{1:>3}".format(word, count))


Program running result:

D:\Python10\python.exe D:/PythonDemo/demo3.py
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\Bobo\AppData\Local\Temp\jieba.cache
Loading model cost 0.428 seconds.
Prefix dict has been built successfully.
Student 276
Study 248
Data 139
Information 132
Pass 126
Question 120
Information Technology 119
Life 115

... (some lines omitted)

Personal opinion 1
Orderly 1
Self-reliant 1

4. Make a cloud map

Since some words do not need to be displayed, use the words and word frequency obtained in the third step to make a stop word list stopwords.txt

from wordcloud import WordCloud, ImageColorGenerator, STOPWORDS
import matplotlib.pyplot as plt
from matplotlib.pyplot import imread
import jieba

file = 'newkebiao.txt'
file_userdict = 'mydict.txt'
jieba.load_userdict(file_userdict)
text = ""
with open(file, 'r', encoding='UTF-8') as f:
    for line in f.readlines(): #按行读文件
        # print(line)
        words = jieba.cut(line, cut_all=False, HMM=True)
        # print("/".join(words))
        counts = {}
        for word in words:
            if len(word) == 1:
                continue
            else:
                counts[word] = counts.get(word, 0) + 1
        keys = [k for k, v in counts.items()]
        text += " ".join([k for k in keys])

list_text = list(text.split())  # 将空格分隔的字符串转为列表

text = " ".join(i for i in list_text)

backg_pic = imread('mask.jpg')  # 读入背景图片

# 加载停用词表
stopwords_file = 'stopwords.txt'
with open(stopwords_file, "r", encoding='UTF-8') as words:
    stopwords = [i.strip() for i in words]

# 设置词云样式
wc = WordCloud(
    background_color='white',
    stopwords=stopwords,
    mask=backg_pic,
    font_path='simhei.ttf',
    max_words=30,
    max_font_size=200,
    random_state=30, scale=1.8)

wc.generate_from_text(text)
image_colors = ImageColorGenerator(backg_pic)
plt.imshow(wc)
plt.axis('off')
plt.show()
wc.to_file("wordcloud.png")

5. The final generated word cloud map

 

Try to use the online "microword cloud" after dividing the course mark into words. For the free things on the Internet, they are all a set, so they have to give up.

Resources include new curriculum standard text, custom dictionaries, and disabled dictionaries

Note: The following resources are only used for study and research, please delete the infringement! ! !

Resource link: Link: https://pan.baidu.com/s/1lwNw4z6SP4bKCV8UIbs84A?pwd=148r 
Extraction code: 148r 
 

Guess you like

Origin blog.csdn.net/chinagaobo/article/details/124756588