Recording operation of my learning process -Day08 python file

Acquaintance file operations

  • Various file manipulation python code.

    The basic components:

    • File Path: path
    • Open: read, write, append, read, write, read ......
    • Encoding: utf-8 / gbk / gb2312 ......
    f = open('文件路径或者相对路',encoding='编码方式',mode='模式') # 格式构成
    cotent = f.read()
    print(content)
    f.close()
  • Code explanation:

    • open:

      Built-in function, open the bottom of the call is the operating system interface.

    • f:

      Variable, the variable name has a convention (f1, fh, file_handler, f_h), there is a variable name, called the file handle. Tou member of any operation carried out, the way had file handles add '' by.

    • encoding:

      Can not write, if you do not write, this is the system default encoding default encoding.

      ​ Windows: gkb

      ​ Linux: utf-8

      MacOS: UTF-8

    • mode:

      It is to define your operating mode: r read mode.

    • f.read():

      You want to manipulate files, such as reading a file, write to the file content, etc., must be operated by a file handle.

    • f.colse():

      Close the file. (Must be closed, otherwise it will be permanent memory.)

  • File operations three steps:

    • open a file.

    • Filehandle corresponding operation.

    • Close the file.

      # 打开文件,得到文件句柄并赋值给一个变量
      f = open('文件.txt', 'r', encoding='utf-8')   # 默认打开模式就为 r
      
      # 通过句柄对文件进行操作
      date = f.read()
      
      # 关闭文件
      f.close()
  • Error reasons:

    • UnicodeDecodeError: codebook is inconsistent with an open file when the file is stored.

    • Path separator problem arises:

      解决方法:在路径前加个 r
      r'C:\Users\Desktop\文件.txt'

File operations: reading

File reading operations, there are four modes (r, rb, r +, r + b), r + and r + b is not used, non-text files rb operation, such as: images, video, audio. There are five methods for each mode (read (), read (n), readline (), readlines (), for).

  • r mode

    Opened read-only file, the file pointer will be placed at the beginning of the file. It is the most frequently used mode of operation is the default mode, if a file is not set mode, then the default mode of operation using the r file.

    For example:

    f = open('文件.txt', mode='r', encoding='utf-8')
    msg = f.read()
    f.close()
    print(msg)
    • read () all at once read

      read () will read all the contents of the file out; drawbacks: if the file will be very large memory footprint, easily lead to memory corruption.

      f = open('测试', mode='r', encoding='utf-8')
      msg = f.read()
      f.close()
      print(msg)
      
      # 输出结果:
      这是一行测试
      A:这是第二行
      B:这是第三行
      C:这是第几行
      D:这是我也不知道第几行
      就这么地吧.
    • read (n) to specify what position the read

      In the mode r, n according to the character reading

      f = open('测试', mode='r', encoding='utf-8')
      msg = f.read(4)
      f.close()
      print(msg)
      
      # 输出结果:
      这是一行
    • the readline () by line reading

      readline () reads a line a time, attention: readline () to read out the data in the back has a \ n, to solve this problem only need to add a strip () in the file we read back out on the OK

      f = open('测试', mode='r', encoding='utf-8')
      msg1 = f.readline()
      msg2 = f.readline().strip()
      msg3 = f.readline()
      msg4 = f.readline()
      f.close()
      print(msg1)
      print(msg2)
      print(msg3)
      print(msg4)
      
      # 输出结果:
      这是一行测试
      
      A:这是第二行
      B:这是第三行
      
      C:这是第几行
      
    • readlines () returns a list

      readlines () returns a list, each element of which is a list of each line of the original file, if the file is large, accounting for memory, easily collapse.

      f = open('测试', mode='r', encoding='utf-8')
      print(f.readlines())    # 还可以这么写的,哈哈
      f.close()
      
      # 输出结果:
      ['这是一行测试\n', 'A:这是第二行\n', 'B:这是第三行\n', 'C:这是第几行\n', 'D:这是我也不知道第几行\n', '就这么地吧.']

    The top four is not very good, if the file is large, the content over more memory they will explode easily, so we have a fifth method.

    • for loop

      Can go to read through the for loop, the file handle is an iterator, he is characterized by each cycle only accounted for one line of data in memory, so save memory.

      f = open('测试', mode='r', encoding='utf-8')
      for line in f:
          print(line)     # 去掉 \n 可以这样写: print(line.strip())  
      # 这种方式就是在一行一行的进行读取,它就执行了下边的功能
      
      '''
      print(f.readline())
      print(f.readline())
      print(f.readline())
      .......
      '''
      # 输出结果:
      这是一行测试
      
      A:这是第二行
      
      B:这是第三行
      
      C:这是第几行
      
      D:这是我也不知道第几行
      
      就这么地吧.
      

    Special Note: read the file must have to be shut down

  • rb mode

    rb mode: open a file in binary format for read-only. The file pointer will be placed at the beginning of the file. Remember the following talk is the same with b are binary file format operations, they mainly operate non-text files: images, audio, video, and if b if you operate in a mode with a file, then do not declare Encoding.

    f1 = open('图片.jpeg', mode='rb')
    tu = f1.read()
    f1.close()
    print(tu)
    
    # 输出结果:
    b'\xff\xd8\xff\xe0\x00\x10JFIF\x00\x01\x01\x00\x00H\x00H\x00\x00\xff\xe1\x00\xb0Exif\x............后面还有.老长老长了..此处省略了.

    rb model also has read read (n) readline (), readlines () for circulating these methods, not one by one demonstration.

File operations: Write

Write file operation, there are four modes (w, wb, w +, w + b), w + w + b and is not commonly used, non-text files wb operations, such as: pictures, video, audio. Methods of operation are: ( 'will be written') Write

  • w mode

    If the file does not exist, with the mode of operation w file, it will create the file, and then write the content.

    f = open('这是一个新创建的文件', encoding='utf-8', mode='w')
    f.write('果然是一个新创建的文件')
    f.close()

    If the file exists, the file using w mode of operation, to empty the contents of the original file, writing the new content.

    f = open('这是一个新创建的文件', encoding='utf-8', mode='w')
    f.write('这是清空后重新写入的内容')
    f.close()
  • wb mode

    wb mode: a file opened in binary format only for writing. If the file already exists then open the file and start editing from the beginning, that is, the original content will be deleted. If the file does not exist, create a new file. Generally used for non-text files such as: pictures, audio, video and so on.

    >> For example:

    I first rb mode with a picture of the contents of bytes to read all type out, and then read out to wb all data written to a new file, so I completed a process similar to a copy of the picture. Specific code as follows:

    # 第一步:将原图片通过 rb 模式读取出来。
    f = open('图片.jpeg', mode='rb')
    content = f.read()
    f.close()
    # 第二步:将读取出来的数据通过 wb 模式写入新文件。
    f1 = open('图片1.jpeg', mode='wb')
    f1.write(content)
    f1.close()

File operations: Append

Additional content is in the file. There are also four file classification mainly four modes: a, ab, a +, a + b, we talk about a

  • a mode

    If the file does not exist, the use of a file operation mode, then it will create the file, and then write the content.

    f = open('追加文本', encoding='utf-8', mode='a')
    f.write('这个文件是没有的,我是新创建的')
    f.close()

    If the file exists, use a file operation mode, then it will face additional content in the final document.

    f = open('追加文本', encoding='utf-8', mode='a')
    f.write('这是己存在的文件,我是新追加的内容')
    f.close()

Other modes of file operations

I did not say there is a pattern, that is the kind of model with the + sign. What is the mode with + it? + It is to add a function. For example, just talking about the r mode is read-only mode. In this mode, the file handle can only read this operation is similar to read and write but can not write this kind of operation. So we want this file handle can be operated both read and write operations they can perform, how does this do? This is going to say a mode: r + write mode, w + read-write mode, a + read-write mode, r + b bytes in read-write mode type .........
here we talk about a species is r + , other similar he can practice on the line.

#1. 打开文件的模式有(默认为文本模式):
r,只读模式【默认模式,文件必须存在,不存在则抛出异常】
w,只写模式【不可读;不存在则创建;存在则清空内容】
a, 只追加写模式【不可读;不存在则创建;存在则只追加内容】

#2. 对于非文本文件,我们只能使用b模式,"b"表示以字节的方式操作(而所有文件也都是以字节的形式存储的,使用这种模式无需考虑文本文件的字符编码、图片文件的jgp格式、视频文件的avi格式)
rb 
wb
ab
注:以b方式打开时,读取到的内容是字节类型,写入时也需要提供字节类型,不能指定编码

#3,‘+’模式(就是增加了一个功能)
r+,读写【可读,可写】
w+,写读【可写,可读】
a+,写读【可写,可读】

#4,以bytes类型操作的读写,写读,写读模式
r+b,读写【可读,可写】
w+b,写读【可写,可读】
a+b,写读【可写,可读】
  • r + mode read and append the sequence can not be wrong

    r +: open a file for reading and writing. The default file pointer will be placed at the beginning of the file.

    f = open('文件的读写.txt', encoding='utf-8', mode='r+')
    content = f.read()
    print(content)
    f.write('这是新写入的内容')
    f.close()

      Note: If you are in read-write mode, read-after-write first, then the file will be a problem, because the default cursor is at the beginning of the file, if you write first, then write the contents of the original content speaks overwritten until covering the content you have written, and then began to read later.

Small summary:

Three general direction:
Read, four modes: r rb r + r + b
write, four modes: w, wb, w +, w + b
append four modes: a, ab, a +, a + b

Function corresponding to:
the operation of the file handles: read read (n) readline ( ) readlines () write ()

Other features of file operations

  • f.tell () Gets the cursor position is the unit: bytes

    f = open('测试', encoding='utf-8', mode='r')
    print(f.tell())
    content = f.read()
    print(f.tell())
    f.close()
    
    # 原文件内容
    这是一行测试
    A:这是第二行
    B:这是第三行
    C:这是第几行
    D:这是我也不知道第几行
    就这么地吧.
    
    # 输出结果:
    0   # 开始的位置
    122 # 结束的位置
  • f.seek () to adjust the position of the cursor (Note: Mobile unit is byte, if it is part of Chinese If multiple utf-8 3)

    f = open('测试', encoding='utf-8', mode='r')
    f.seek(9)
    content = f.read()
    print(content)
    f.close()
    
    # 原文件内容
    这是一行测试
    A:这是第二行
    B:这是第三行
    C:这是第几行
    D:这是我也不知道第几行
    就这么地吧.
    
    # 输出结果:
    行测试
    A:这是第二行
    B:这是第三行
    C:这是第几行
    D:这是我也不知道第几行
    就这么地吧.
  • f.flush () to force a refresh

    f = open('测试', encoding='utf-8', mode='w')
    f.write('fafdsfsfsadfsaf')
    f.flush()
    f.close()

Another way to open a file (this is common)

  • with open() as ....

    # 优点1:不用手动关闭文件句柄
    # 利用with上下文管理这种方式,它会自动关闭文件句柄。
    with open('测试', encoding='utf-8', mode='r') as f:
        print(f.read())
    
    # 优点2:可以加多个 open 操作
    # 一个with 语句可以操作多个文件,产生多个文件句柄。
    with open('测试', encoding='utf-8', mode='r') as f,\
            open('测试', encoding='utf-8', mode='w') as f1:
        print(f.read())
        f1.write('kckckckckckckkck')

      Here we must note a problem, open the file with the statement although the way, you do not have to manually close the file handle, more provincial thing, but thanks to its automatic shut down file handle, there is a period of time, this time is not fixed, so there will be a problem If you open the file by t1 r mode with the statement, then you are following a pattern again to open the file t1, at this time when it is possible the second time you open the file t1, the first file handles are not closed off, likely there will be mistakes, his only solution before you open the file a second time, a file handle is manually closed.

Modify the file

Data files are stored on the hard disk, so there is only covering, there is no modification is to say, we usually see modify the file, all simulated results, specifically implemented in two ways:

  • Changed file operation process:
    1, the original file is opened in read mode.
    2. Create a new file in write mode.
    3, the original contents of the file is modified to read out the new content written to the new file.
    4, the original file is deleted.
    5, rename the new file to the original file.

  • Way: the contents of the file stored in the hard disk all loaded into memory, the memory can be modified, the modification is completed, and then covered by the memory to the hard disk (word, vim, nodpad ++ editor, etc.)

    import os   # 调用系统模块
    with open('测试', encoding='utf-8') as f1,\
        open('测试.bak', encoding='utf-8',mode='w') as f2:
        old_content = f1.read() # 全部读入内存,如果文件很大,会卡死
        new_content = old_content.replace('文', 'wen')   # 在内存中完成修改
        f2.write(new_content)   # 一次性写入新文件
    os.remove('测试') # 删除原文件
    os.rename('测试.bak', '测试')   # 将新建的文件重命名为原文件
    
    # 原文件内容
    **文件操作改的流程:**
    1,以读的模式打开原文件。
    2,以写的模式创建一个新文件。
    3,将原文件的内容读出来修改成新内容,写入新文件。
    4,将原文件删除。
    5,将新文件重命名成原文件。
    # 修改后的内容
    **wen件操作改的流程:**
    1,以读的模式打开原wen件。
    2,以写的模式创建一个新wen件。
    3,将原wen件的内容读出来修改成新内容,写入新wen件。
    4,将原wen件删除。
    5,将新wen件重命名成原wen件。
  • Second way: the contents of the file stored in the hard disk into memory is read row by row, the new file is written modification is completed, and finally covering the source file with a new file (this is commonly used)

    import os
    with open('测试', encoding='utf-8') as f1,\
        open('测试.bak', encoding='utf-8',mode='w') as f2:
        for line in f1: # 一行一行的改,占内存少
            new_line = line.replace('wen', '文')
            f2.write(new_line)
    os.remove('测试')
    os.rename('测试.bak', '测试')
    
    # 原文件内容
    **wen件操作改的流程:**
    1,以读的模式打开原wen件。
    2,以写的模式创建一个新wen件。
    3,将原wen件的内容读出来修改成新内容,写入新wen件。
    4,将原wen件删除。
    5,将新wen件重命名成原wen件。
    
    # 修改后的内容
    **文件操作改的流程:**
    1,以读的模式打开原文件。
    2,以写的模式创建一个新文件。
    3,将原文件的内容读出来修改成新内容,写入新文件。
    4,将原文件删除。
    5,将新文件重命名成原文件。

Guess you like

Origin www.cnblogs.com/guanshou/p/12075610.html