Python text processing: construction and analysis of word cloud in Romance of the Three Kingdoms

  • Beginners in Python, so far have initially learned the basic syntax and the operation of common libraries.
  • "The Romance of the Three Kingdoms" word cloud was done on a whim during the period of summarizing learning experience recently. The level is extremely limited and it is only for records.
  • Self-learning Python strongly recommends the MOOC of Professor Songtian of Beijing Institute of Technology: Python language programming ; the depth of the courseware is very comfortable, very suitable for beginners with zero foundation or audiences who have data analysis needs but have no intention of digging deeper into technology.
  • The above is the background

1. Corpus and external library:

Romance of the Three Kingdoms full text txt format corpus: Python123.io

Jieba participle: Github

WordCloud repository: Github

2. Code:

import jieba
import  wordcloud as wc
#三国演义.txt

#文本读取
f1=open('datalib/threekingdoms.txt','r',encoding='utf-8')
t1=f1.read()
f1.close()

#文本预处理(同义词合并)
t1=t1.replace('孔明曰','孔明')
t1=t1.replace('玄德曰','玄德')
t1=t1.replace('玄德','刘备')
t1=t1.replace('关公','云长')
t1=t1.replace('云长','关羽')


#结巴分词+过滤单字+去除虚词
ls=jieba.lcut(t1)
ls = [word for word in ls if len(word)>1]
excludes={'不可','却说','二人','不能','次日','左右','主公','于是',
          '今日','天下','大喜','将军','引兵','商议','陛下','都督',
          '不敢','如何','如此','众将','只见','后主','此人','不知',
          '人马','先主','一人','丞相'}
for word in ls:
    if word in excludes:
        ls.remove(word)
txt1=" ".join(ls)

#词云生成
w1=wc.WordCloud(width=1000,height=700,background_color='white',\
                max_words=20,font_path='msyh.ttc')
w1.generate(txt1)
w1.to_file("datalib/3KingWordCloud.png")

4. Word cloud generation

5. Analysis

5.1 See the role ranking of "The Romance of the Three Kingdoms" from the word cloud

  • A-level protagonists: Liu Bei; Guan Yu; Kong Ming; Cao Cao

The huge word "Liu Bei" in the word cloud shows the detached status of the emperor's uncle in "The Romance", and the main line of "respecting Liu and restraining Cao" is undoubtedly revealed.

"Zhuge Liang" and "Guan Yu" are slightly inferior to "Liu Bei" in terms of font size, which is undoubtedly the civil and military arms of the Shu Han in the reader's impression (Zhang Fei: What about me.jpg)

Cao Mengde, as a "cute and charming villain" who combines power, strategy, and arrogance, although he is deliberately suppressed in the novel, can still squeeze into the ranks of the first echelon, which shows his charm. (Sun Quan: Well, I'm gone)

  • B-level protagonists: Zhang Fei, Lu Bu, Zhao Yun

It is not difficult to see from the font size that the appearance frequency of the B-level protagonists is obviously less than that of the A-level three.

As a member of the "Liu Guan Zhang" group, Zhang Fei was squeezed out of the first echelon, which is really miserable.

Although Lu Bu was a star role in the early stage, he was still among the ranks of B-level protagonists, which can be seen in the meticulous portrayal.

As a kanban character of the Three Kingdoms derivatives, Zhao Yun is at the tail end of the B-level echelon when it comes to the frequency of appearances, but it seems that Zilong is indeed rarely seen in the second half of the novel...

  •  C-level protagonists: Sun Quan, Zhou Yu, Yuan Shao, Sima Yi, Wei Yan, Ma Chao

Just now I complained that Zhang Fei was squeezed out of the "Liu Guan Zhang" combination, but Sun Quan, who is the "first generation head" of the Three Kingdoms, can only be a "chicken head" in the C-level echelon. It is really miserable; but it also coincides Its mixed historical images of "Sun Zhongmou" and "Sun Shiwan" coexist.

Zhou Yu and Sima Yi, as Zhuge Liang’s main characters in the early and late laning, have also been able to join the ranks of the protagonists (Kong Ming, YYDS)

At first I was surprised that Yuan Shao, Ma Chao, and Wei Yan appeared in Ciyun, but after thinking about it, it turned out that these three brothers were the introductions to promote the evolution of the plot in the early, middle and late stages respectively? (doubtful)

5.2 Looking at the geographical terms in "The Romance of the Three Kingdoms" from the word cloud

  • The place names appearing in the word cloud include Jingzhou, Soochow, Hanzhong
  • As the focus of contention in the early and mid-term of the novel, Jingzhou has produced countless familiar allusions. It is no surprise that it ranks C in the word cloud
  • Soochow, as the only country name, can beat Wei and Shu to appear in the word cloud. On the one hand, it can be understood that Soochow is playing team tactics and playing collectivism. In contrast, Wei and Shu are based on the portrayal of individual heroism. color. On the other hand, the Battle of Chibi, which took place on Soochow's home ground, is a link between the past and the future for the development of the novel's plot. Since then, the situation has changed from chaos to a game of three kingdoms. With this section, it is reasonable for Soochow to appear in the word cloud.
  • Hanzhong's appearance in the word cloud is unexpected, presumably because it was the place where Zhuge Liang's Northern Expedition must pass in the later period. Comparing Hanzhong with Jingzhou and Soochow, I accidentally found that the first two locations have corresponding power works (eg. Jingzhou: "Shadow"; Soochow: "Red Cliff"), but Hanzhong rarely appears in movies. This can look forward to.

5.3 Looking at the war in "The Romance of the Three Kingdoms" from the word cloud

In the word cloud generation parameters, set the threshold to the top 20 word frequencies. The above 16 words have been discussed, and the remaining 4 words are: Shu Bing, Wei Bing, Sergeant, and Army Horse. It is not difficult to see that two pairs of phrases can be formed respectively. .

Shu Bing vs Wei Bing: There are countless well-known battles in "Romance of the Three Kingdoms", but Wei Bing and Shu Bing are missing most frequently. Looking back at the chapters where these two words frequently appear, it should be during Zhuge Liang's Northern Expedition. It also coincides with the vertical line of Wei and Shu's struggle for supremacy in the later period.

Sergeant vs. Army Horse: The font size of the former is much larger than that of the latter. It can be seen that although the capable ministers and generals portrayed in the novel are all leaping with swords and horses, the fighting between the big soldiers is the baseline of the battle.

Guess you like

Origin blog.csdn.net/u010785550/article/details/108669652