"Learning Python with You Hand in Hand" 32-File Reading

In the last article "Learning Python with You Hand in Hand" 31-File Opening , we learned how to open a file. Today we continue to study file operations, and will focus on explaining several file reading methods.

Let me say one more thing here. When we learn more advanced file manipulation methods in the future, we may use more commands like pandsa.read_csv to read files. However, it is also very important to understand the basic principles of python processing files, which is what this article will introduce.

There are many ways to read files in Python, let's introduce them one by one below.

1. Read the entire file

In the previous article, we saw that the open() function returns only an iterator. At this time, we need the read() method to read the file. The method used is the same as the various methods we used before. Just add "." and the method name after the object, and then we can read the entire file.

In [1]: path = 'lesson/text/contents.txt'
        file_object = open(path, encoding = 'utf-8').read()
        print(file_object)
Out[1]: 《手把手陪您学Python》1——为什么要学Python?
        《手把手陪您学Python》2——Python的安装 
        《手把手陪您学Python》3——PyCharm的安装和配置 
        《手把手陪您学Python》4——Hello World!
        《手把手陪您学Python》5——Jupyter Notebook 
        《手把手陪您学Python》6——字符串的标识 
        《手把手陪您学Python》7——字符串的索引 
        《手把手陪您学Python》8——字符串的切片 
        《手把手陪您学Python》9——字符串的运算 
        《手把手陪您学Python》10——字符串的函数
         

After the opened target file is read by the read() method, the desired effect can be printed out.

Although this output result looks no different from the original file, it is actually a bit different, that is, there is an extra blank line at the end of the output result.

Why is there an extra blank line? Because in our target file, there is a newline after the last line of text, read() will automatically return an empty string when it reaches the end of the file, and the newline plus this empty string will be a blank line when displayed. To delete this extra blank line, one way is to avoid the final line break when making the target file, and the other can use the string.rstrip(str) method we learned before-to intercept the string The specified character on the right (end) (str is a space by default).

If you modify the program again in this way, you can get a result that is completely consistent with the content of the target file.

In [2]: path = 'lesson/text/contents.txt'
        file_object = open(path, encoding = 'utf-8').read()
        print(file_object.rstrip())
Out[2]: 《手把手陪您学Python》1——为什么要学Python?
        《手把手陪您学Python》2——Python的安装
        《手把手陪您学Python》3——PyCharm的安装和配置
        《手把手陪您学Python》4——Hello World!
        《手把手陪您学Python》5——Jupyter Notebook
        《手把手陪您学Python》6——字符串的标识
        《手把手陪您学Python》7——字符串的索引
        《手把手陪您学Python》8——字符串的切片
        《手把手陪您学Python》9——字符串的运算
        《手把手陪您学Python》10——字符串的函数

In addition, we can see what happens if we read Chinese characters without using the encoding parameter.

In [3]: path = 'lesson/text/contents.txt'
        file_object = open(path).read()
        print(file_object)
Out[3]: ---------------------------------------------------------------------------
        UnicodeDecodeError                        Traceback (most recent call last)
        <ipython-input-10-38447936f978> in <module>
              1 path = 'lesson/text/contents.txt'
        ----> 2 file_object = open(path).read()
              3 print(file_object)
              
        UnicodeDecodeError: 'gbk' codec can't decode byte 0xaa in position 14: illegal multibyte sequence

It can be seen that when the target file contains Chinese characters, but the encoding parameter is not used, there will be an error that gbk characters cannot be decoded.

Since in most cases there will be some Chinese characters in our files, including Chinese punctuation, it is recommended to add the parameter "encoding ='utf-8'" when using the open() function. Make sure you don’t make mistakes because of coding problems.

If you don't know the encoding of the file, you can import the sys module and use the getdefaultencoding() method to check the default encoding of the file, but because it's beyond what we have learned so much, just by the way.

2. Use an iterator to read the file line by line

Since the opened file returns an iterator, in addition to reading the entire file, you can also use the for loop statement to read the file line by line, which is as simple as traversing the list. The output result is each line in the target file.

In [4]: path = 'lesson/text/contents.txt'
        file_object = open(path, encoding = 'utf-8')
        for line in file_object:
            print(line)
Out[4]: 《手把手陪您学Python》1——为什么要学Python?

        《手把手陪您学Python》2——Python的安装
        
        《手把手陪您学Python》3——PyCharm的安装和配置
        
        《手把手陪您学Python》4——Hello World!
        
        《手把手陪您学Python》5——Jupyter Notebook
        
        《手把手陪您学Python》6——字符串的标识
        
        《手把手陪您学Python》7——字符串的索引
        
        《手把手陪您学Python》8——字符串的切片

        《手把手陪您学Python》9——字符串的运算
        
        《手把手陪您学Python》10——字符串的函数
        

As you can see, when reading and printing line by line, there are more blank lines in the output, so rstrip() is also needed.

In [5]: path = 'lesson/text/contents.txt'
        file_object = open(path, encoding = 'utf-8')
        for line in file_object:
            print(line.rstrip())
Out[5]: 《手把手陪您学Python》1——为什么要学Python?
        《手把手陪您学Python》2——Python的安装
        《手把手陪您学Python》3——PyCharm的安装和配置
        《手把手陪您学Python》4——Hello World!
        《手把手陪您学Python》5——Jupyter Notebook
        《手把手陪您学Python》6——字符串的标识
        《手把手陪您学Python》7——字符串的索引
        《手把手陪您学Python》8——字符串的切片
        《手把手陪您学Python》9——字符串的运算
        《手把手陪您学Python》10——字符串的函数

If we use the list comprehension that we have learned before, the above program can be simpler.

In [6]: path = 'lesson/text/contents.txt'
        print([x for x in open(path, encoding = 'utf-8')])
Out[6]: ['《手把手陪您学Python》1——为什么要学Python?\n', '《手把手陪您学Python》2——Python的安装\n', '《手把手陪您学Python》3——PyCharm的安装和配置\n', '《手把手陪您学Python》4——Hello World!\n', '《手把手陪您学Python》5——Jupyter Notebook\n', '《手把手陪您学Python》6——字符串的标识\n', '《手把手陪您学Python》7——字符串的索引\n', '《手把手陪您学Python》8——字符串的切片\n', '《手把手陪您学Python》9——字符串的运算\n', '《手把手陪您学Python》10——字符串的函数\n']

The list comprehension takes the content of each line in the target file and the newline character as elements to form a list. So every element is followed by a newline escape character. The reason why the previous example results did not include escape characters is because the print() function is used. If the previous examples do not use the print() function to print, there will be escape characters for line breaks. You can try it.

If you want to remove this escape character, you can also use the rstrip() method.

In [7]: path = 'lesson/text/contents.txt'
        print([x.rstrip() for x in open(path, encoding = 'utf-8')])
Out[7]: ['《手把手陪您学Python》1——为什么要学Python?', '《手把手陪您学Python》2——Python的安装', '《手把手陪您学Python》3——PyCharm的安装和配置', '《手把手陪您学Python》4——Hello World!', '《手把手陪您学Python》5——Jupyter Notebook', '《手把手陪您学Python》6——字符串的标识', '《手把手陪您学Python》7——字符串的索引', '《手把手陪您学Python》8——字符串的切片', '《手把手陪您学Python》9——字符串的运算', '《手把手陪您学Python》10——字符串的函数']

When reading the file line by line after open(), note that the generated iterator can only be used once. Because in the process of reading line by line, the pointer will move from the beginning to the end of the file instead of automatically returning to the beginning. So if you use a for loop to read line by line, you can only use it once.

3. Use the readlines() method to read line by line

In addition to using the for loop to read line by line, you can also use the readlines() method. At this time, the return is a list composed of each line, which is simpler than using a for loop to generate a list.

In [8]: path = 'lesson/text/contents.txt'
        file_object = open(path, encoding = 'utf-8').readlines()
        print(file_object)
Out[8]: ['《手把手陪您学Python》1——为什么要学Python?\n', '《手把手陪您学Python》2——Python的安装\n', '《手把手陪您学Python》3——PyCharm的安装和配置\n', '《手把手陪您学Python》4——Hello World!\n', '《手把手陪您学Python》5——Jupyter Notebook\n', '《手把手陪您学Python》6——字符串的标识\n', '《手把手陪您学Python》7——字符串的索引\n', '《手把手陪您学Python》8——字符串的切片\n', '《手把手陪您学Python》9——字符串的运算\n', '《手把手陪您学Python》10——字符串的函数\n']

When using the readlines() method, it is also read by the movement of the pointer, so like the for loop above, the readlines() method can only be used once.

4. Read the specified characters

In addition to reading the entire file using the read() method, when inputting parameters, you can also read a specified number of characters from the beginning. The number of characters is the parameter.

In [9]: path = 'lesson/text/contents.txt'
        file_object = open(path, encoding = 'utf-8').read(100)   # 读取100个字符
        print(file_object)
Out[9]: 《手把手陪您学Python》1——为什么要学Python?
        《手把手陪您学Python》2——Python的安装
        《手把手陪您学Python》3——PyCharm的安装和配置
        《手把手陪您学Pytho

5. Return the pointer position

Just now we talked about the movement of the pointer many times. The process of reading the file is actually realized by the advancement of the pointer or the handle.

By default, the read() method starts from the first character of the file (the pointer is also at the position of the first character). At the end, the pointer will automatically move to the position of the next character and prepare for the next read instruction.

Through the tell() method, the current position of the pointer can be returned.

In [10]: path = 'lesson/text/contents.txt'
         file_object = open(path, encoding = 'utf-8')
         print(file_object.read(8))
         file_object.tell()
Out[10]: 《手把手陪您学P
         22

In the UTF-8 format, each Chinese character and Chinese punctuation occupies 3 bytes. So in the above example, when read reads 8 characters, including 7 Chinese characters (including Chinese symbols) occupy 21 bytes, and 1 English occupies 1 byte, so the final pointer position is at 22 bytes .

Using pointers, we can verify the principle that the for loop can only be used once.

In [11]: path = 'lesson/text/contents.txt'
         file_object = open(path, encoding = 'utf-8')
         for line in file_object:
             print(line.rstrip())
         file_object.tell()
Out[11]: 《手把手陪您学Python》1——为什么要学Python?
         《手把手陪您学Python》2——Python的安装
         《手把手陪您学Python》3——PyCharm的安装和配置
         《手把手陪您学Python》4——Hello World!
         《手把手陪您学Python》5——Jupyter Notebook
         《手把手陪您学Python》6——字符串的标识
         《手把手陪您学Python》7——字符串的索引
         《手把手陪您学Python》8——字符串的切片
         《手把手陪您学Python》9——字符串的运算
         《手把手陪您学Python》10——字符串的函数
         575

After a traversal, the pointer moves to the end of the article as mentioned above. At this time, at the position of 575 bytes, the calculation method is the same as above, so you can no longer use the for loop to traverse, or use the readlines() method. Up.

6. Move the pointer to the specified position

If you want to control the position of the pointer yourself, you can use the seek() method to move the pointer to the specified position, and cooperate with the read() method to read the rest of the file or the specified number of characters.

In [12]: path = 'lesson/text/contents.txt'
         file_object = open(path, encoding = 'utf-8')
         file = file_object.seek(22)
         file_object.tell()
         print(file_object.read())
Out[12]: ython》1——为什么要学Python?
         《手把手陪您学Python》2——Python的安装
         《手把手陪您学Python》3——PyCharm的安装和配置
         《手把手陪您学Python》4——Hello World!
         《手把手陪您学Python》5——Jupyter Notebook
         《手把手陪您学Python》6——字符串的标识
         《手把手陪您学Python》7——字符串的索引
         《手把手陪您学Python》8——字符串的切片
         《手把手陪您学Python》9——字符串的运算
         《手把手陪您学Python》10——字符串的函数

In the above example, first use the seek() method to move the pointer to the 22-byte position (the same position as the previous example), then use the tell() method to verify, and finally use the read() method to read the pointer position The remaining content after the.

When using seek() to move the pointer, note that when Chinese characters are involved, the bytes of Chinese characters cannot be separated. For example, the first character in the target file is the Chinese "", which occupies 3 bytes. If it is just seek() 1 or 2, there is no problem, but if you want to move the pointer to 1 byte or 2 characters If the position of the section is printed later, an error will be reported.

In [13]: path = 'lesson/text/contents.txt'
         file_object = open(path, encoding = 'utf-8')
         file = file_object.seek(2)
         print(file_object.read())
Out[13]: ---------------------------------------------------------------------------
         UnicodeDecodeError                        Traceback (most recent call last)
         <ipython-input-76-27e34aade1ea> in <module>
               2 file_object = open(path, encoding = 'utf-8')
               3 file = file_object.seek(2)
         ----> 4 print(file_object.read())

         ~\anaconda3\lib\codecs.py in decode(self, input, final)
             320         # decode input (taking the buffer into account)
             321         data = self.buffer + input
         --> 322         (result, consumed) = self._buffer_decode(data, self.errors, final)
             323         # keep undecoded input until the next call
             324         self.buffer = data[consumed:]

         UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8a in position 0: invalid start byte

The content of the error is that when the pointer is in the middle of the position of consecutive bytes occupied by Chinese characters, it cannot be decoded.

Similarly, by using seek(), after the for loop traverses the file, by moving the pointer to the initial position, you can traverse again.

In [14]: path = 'lesson/text/contents.txt'
         file_object = open(path, encoding = 'utf-8')
         for line in file_object:
             print(line.rstrip())
         file_object.seek(0)
         for line in file_object:
             print(line.rstrip())
Out[14]: 《手把手陪您学Python》1——为什么要学Python?
         《手把手陪您学Python》2——Python的安装
         《手把手陪您学Python》3——PyCharm的安装和配置
         《手把手陪您学Python》4——Hello World!
         《手把手陪您学Python》5——Jupyter Notebook
         《手把手陪您学Python》6——字符串的标识
         《手把手陪您学Python》7——字符串的索引
         《手把手陪您学Python》8——字符串的切片
         《手把手陪您学Python》9——字符串的运算
         《手把手陪您学Python》10——字符串的函数
         《手把手陪您学Python》1——为什么要学Python?
         《手把手陪您学Python》2——Python的安装
         《手把手陪您学Python》3——PyCharm的安装和配置
         《手把手陪您学Python》4——Hello World!
         《手把手陪您学Python》5——Jupyter Notebook
         《手把手陪您学Python》6——字符串的标识
         《手把手陪您学Python》7——字符串的索引
         《手把手陪您学Python》8——字符串的切片
         《手把手陪您学Python》9——字符串的运算
         《手把手陪您学Python》10——字符串的函数

The above are several basic methods of reading files in Python. I hope everyone can understand them well, which will help us learn more advanced file reading operations later.

In the next article, we will introduce how to close and write files, so stay tuned.

 

image

 


Thanks for reading this article! If you have any questions, please leave a message and discuss together ^_^

To read other articles in the "Learning Python with You Hand in Hand" series, please follow the official account and click on the menu selection, or click the link below to go directly.

"Learning Python with You Hand in Hand" 1-Why learn Python?

"Learning Python with you hand in hand" 2-Python installation

"Learning Python with You Hand in Hand" 3-PyCharm installation and configuration

"Learning Python with You Hand in Hand" 4-Hello World!

"Learning Python with You Hand in Hand" 5-Jupyter Notebook

"Learning Python with You Hand in Hand" 6-String Identification

"Learning Python with You Hand in Hand" 7-Index of Strings

"Learning Python with You Hand in Hand" 8-String Slicing

"Learning Python with You Hand in Hand" 9-String Operations

"Learning Python with You Hand in Hand" 10-String Functions

"Learning Python with You Hand in Hand" 11-Formatted Output of Strings

"Learning Python with You Hand in Hand" 12-Numbers

"Learning Python with You Hand in Hand" 13-Operation

"Learning Python with You Hand in Hand" 14-Interactive Input

"Learning Python with You Hand in Hand" 15-judgment statement if

"Learning Python with You Hand in Hand" 16-loop statement while

"Learning Python with You Hand in Hand" 17-the end of the loop

"Learning Python with You Hand in Hand" 18-loop statement for

"Learning Python with You Hand in Hand" 19-Summary of the first stage

"Learning Python with You Hand in Hand" 20-List

"Learning Python with You Hand in Hand" 21-Tuples

"Learning Python with You Hand in Hand" 22-Dictionary

"Learning Python with You Hand in Hand" 23-Built-in Sequence Function

"Learning Python with You Hand in Hand" 24-Collection

"Learning Python with You Hand in Hand" 25-List Comprehension

"Learning Python with You Hand in Hand" 26-Custom Functions

"Learning Python with You Hand in Hand" 27-Parameters of Custom Functions

"Learning Python with You Hand in Hand" 28-the return value of a custom function

"Learning Python with You Hand in Hand" 29-Anonymous Functions

"Learning Python with You Hand in Hand" 30-Module

"Learning Python with You Hand in Hand" 31-File Opening

For Fans: Follow the "also said Python" public account and reply to "Hand 32" to download the sample sentences used in this article for free.

Also talk about Python-a learning and sharing area for Python lovers

Guess you like

Origin blog.csdn.net/mnpy2019/article/details/111304047