5.1 Count the number of words in English files python

 The task of this level: write a small program that can count the number of words in the file, and use replace to replace the punctuation in the text

code show as below:

# 补充你的代码
a = input()
import string
with open(f'/data/bigfiles/{a}', 'r', encoding='utf-8') as text:  # 打开文件a.txt,创建文件对象
    txt = text.read()  # 读文件为字符串
    for i in ",.!\'":
        txt = txt.replace(i, " ")
    x1 = txt.split()
    print('共有'+str(len(x1))+"个单词")




(The overall idea, first replace the punctuation marks in the text with spaces, then separate words with spaces, and finally use len()

Count the number of words in the text)

related information

In order to complete this task, you need to master:

1. Get file content 2. Use of string method

1. Read the file

open(filename)The function can open files in formats such as txt. csv‪‬‪‬‪‬‪‬‪‬‮‬‪‬‭‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‪‬‪‬‪‬‪‬‪‬‪‬‮‬‫‬‪‬‪‬ ‪‬‪‬‪‬‪‬‮‬‪‬‪‬

For example:

with open('a.txt', 'r', encoding='utf-8') as text: # 打开文件a.txt,创建文件对象
txt = text.read() # 读文件为字符串
print(txt) # 输出字符串

2. String method

For the convenience of programming, Pythonmany methods are provided for us. Here we will learn two of the string methods.

2.1 Replacement

str.replace(oldvalue, newvalue, count)

The method is to replace the oldvalue string count times with another newvalue string in the string str.

Parameter value:

serial number parameter describe
1 oldvalue required. The string to retrieve.
2 newvalue required. A string to replace the old value.
3 count optional. A number specifying the number of occurrences of the old value to replace. The default is to replace all occurrences of the search string.

Examples are as follows:

txt = "I like bananas. She likes bananas too. "

x1 = txt.replace("bananas", "apples") # 替换所有
print(x1) # I like apples. She likes apples too.

x2 = txt.replace("bananas", "apples", 1) # 只替换一次
print(x2) # I like apples. She likes bananas too.

2.2 Segmentation

str.split(sep=None, maxsplit=- 1)

Split a string into a list where each word is a list item. The delimiter can be specified, and the default delimiter is a blank character (including space, tab \t, newline \n, carriage return \r, feed \f and vertical tab \v).

Parameter value:

serial number parameter describe
1 sep optional. Specifies the delimiter to use when splitting the string. The default is a blank character.
2 maxsplit optional. Specifies the number of splits to perform. The default is -1, which is "all occurrences".

Guess you like

Origin blog.csdn.net/m0_70456205/article/details/129778272