1. Course standard pdf to txt
There are many web applications on the Internet that can be converted, but for Q, I used АББYY FineReader to convert. I used some problems and manually corrected it. I can also use adobe Acrobat XI Pro to convert, but the effect is not as good as the former.
2. Self-built user dictionary mydict.txt
One word in the following line
Data Algorithm Network Information Processing Information Security Information Privacy Information Awareness Computational Thinking Digital Learning and Innovation Information Social Responsibility Digital Literacy Information Literacy Core Literacy Original Innovation Original Spirit Independent Controllable Technology Artificial Intelligence Innovation National Security Network Security Data Security Interdisciplinary Themes Internet of Things Internet Sharing Online Learning Online Society Online Platform Information Technology Authentic Learning Scientific Principles Digital Age Online Society Digital Devices Problem Solving Technology Ethics Experiential Process Scenarios of Control
3. Participate with stammer
'''
用结巴分词,统计词频,定义了一个常用词词典
'''
import jieba
file = 'newkebiao.txt'
file_userdict = 'mydict.txt'
jieba.load_userdict(file_userdict)
f = open(file, 'r', encoding='UTF-8').read()
words = jieba.cut(f, cut_all=False, HMM=True)
counts = {}
for word in words:
if len(word) == 1:
continue
else:
counts[word] = counts.get(word, 0) + 1
items = list(counts.items())
items.sort(key=lambda x: x[1], reverse=True)
for i in range(len(items)):
word, count = items[i]
print("{0:<12}{1:>3}".format(word, count))
Program running result:
D:\Python10\python.exe D:/PythonDemo/demo3.py
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\Bobo\AppData\Local\Temp\jieba.cache
Loading model cost 0.428 seconds.
Prefix dict has been built successfully.
Student 276
Study 248
Data 139
Information 132
Pass 126
Question 120
Information Technology 119
Life 115
... (some lines omitted)
Personal opinion 1
Orderly 1
Self-reliant 1
4. Make a cloud map
Since some words do not need to be displayed, use the words and word frequency obtained in the third step to make a stop word list stopwords.txt
from wordcloud import WordCloud, ImageColorGenerator, STOPWORDS
import matplotlib.pyplot as plt
from matplotlib.pyplot import imread
import jieba
file = 'newkebiao.txt'
file_userdict = 'mydict.txt'
jieba.load_userdict(file_userdict)
text = ""
with open(file, 'r', encoding='UTF-8') as f:
for line in f.readlines(): #按行读文件
# print(line)
words = jieba.cut(line, cut_all=False, HMM=True)
# print("/".join(words))
counts = {}
for word in words:
if len(word) == 1:
continue
else:
counts[word] = counts.get(word, 0) + 1
keys = [k for k, v in counts.items()]
text += " ".join([k for k in keys])
list_text = list(text.split()) # 将空格分隔的字符串转为列表
text = " ".join(i for i in list_text)
backg_pic = imread('mask.jpg') # 读入背景图片
# 加载停用词表
stopwords_file = 'stopwords.txt'
with open(stopwords_file, "r", encoding='UTF-8') as words:
stopwords = [i.strip() for i in words]
# 设置词云样式
wc = WordCloud(
background_color='white',
stopwords=stopwords,
mask=backg_pic,
font_path='simhei.ttf',
max_words=30,
max_font_size=200,
random_state=30, scale=1.8)
wc.generate_from_text(text)
image_colors = ImageColorGenerator(backg_pic)
plt.imshow(wc)
plt.axis('off')
plt.show()
wc.to_file("wordcloud.png")
5. The final generated word cloud map
Try to use the online "microword cloud" after dividing the course mark into words. For the free things on the Internet, they are all a set, so they have to give up.
Resources include new curriculum standard text, custom dictionaries, and disabled dictionaries
Note: The following resources are only used for study and research, please delete the infringement! ! !
Resource link: Link: https://pan.baidu.com/s/1lwNw4z6SP4bKCV8UIbs84A?pwd=148r
Extraction code: 148r