Count the number of occurrences of Dasheng in Journey to the West

Knowledge points that need to be mastered in advance:
First, the way the file is opened (with open)
Second, the file's readlines () method (returns a list of strings)
Third, the regular expression re.split () method to split the string into a list of strings
4. The in keyword to determine whether a string is in another string.
5. The flexible use of for loops and if statements

import re
with open('./xiyouji.txt','r',encoding = 'utf-8') as f:
	paragraphs = f.readlines()	
'''p通过readlines函数获得了一个列表,西游记全文中的每一段话都是列表中的一个元素
   注意这里段落的分割是根据回车键丫就是'\n'作为标志的
'''	

```python
target = '大圣'
counter= 0
word_num = 0
for paragraph in paragraphs:
	sentences = re.split('。|!|,|:|“|”|?| |;',paragraph)
	#通过正则表达式,用多个分隔符号,分割paragraph中的字符串为字符串列表
	for sentence in sentences:
		sentence = sentence.strip()
		if target in sentence:
			counter += 1
			print(sentence)
print(f'{target}一共出现了{counter}次')

Running result: It
Insert picture description here
can be seen that the word Da Sheng appeared 1270 times in the Journey to the West.

Of course, we can also use jieba participle to get it all at once:

import jieba
with open('./xiyouji.txt','r',encoding='utf-8') as f:
	xyj_text = f.read()
word_list = list(jieba.cut(xyj_text))
target = '大圣'
count = 0
for word in word_list:
	if target in word:
		count += 1
print(f'{target}出现的次数是:{count}')

The result of the operation is:
Insert picture description here

Published 273 original articles · praised 40 · 30,000+ views

Guess you like

Origin blog.csdn.net/weixin_41855010/article/details/105241320