Basic usage of file operation technology

1. Text files and binary files:

  • Text file: store ordinary character text, the default is the unicode character set (two bytes represent one character), which can represent up to 65536
  • Binary file: The content is stored in "bytes" and cannot be opened with Notepad. Common ones are: video, audio, picture, DOC file

2. File operation related modules

  • Input and output of io file stream
  • os basic operating system functions
  • glob finds file path names that meet specific rules
  • fnmatch uses a pattern to match the path name of a file
  • fileinput handles multiple input files
  • filecmp for file comparison
  • cvs is used for cvs file processing
  • pickle & cPickle for serialization and deserialization
  • xml is used for XML data processing
  • bz2 gzip zipfile zlib tarfile is used to process compressed and decompressed files corresponding to different algorithms

3. Create a file object

open(): open(filename [,openmodel]) : f = open(r “d:\b.txt”, “a”)

If it is just a file name, it represents the file in the current directory; the file name can be entered in the full path; r reduces the input of \

  • Open method:
    • r read mode
    • w Write mode if the file does not exist, create it, if it exists, read and write the content again
    • a Append mode will be automatically created if the append file does not exist
    • b In the binary mode, if “b” is not written, the default is a text file
    • + Read and write mode

4. Common encoding methods

  • ASCII 7 bits can only represent 128 characters
  • SIO8859-1 8 bits represent 1 character, can represent 256 characters, compatible with ASCII
  • GBK GB2312 GB18030 National standard Chinese characters 2 bytes
  • Unicode fixed-length encoding, 2 bytes represent the encoding form used by python by default for a character
  • UTF-8 variable length encoding 1-4 bytes represents one character for English and one byte for Chinese characters and 3 bytes for Chinese characters

5. File writing steps

  1. Create file object
  2. data input
  3. Close file object

There are three more common methods, which are recorded as follows:

filename = "my01.txt"
f = open(filename,'w', encoding='UTF-8')
s = "OLG\nHello\n小可爱\n"
f.write(s)
f.close() # 关闭文件流
try:
    f = open(r"test/my02.txt","a") # 先写入缓存区
    str = "hello,my02"
    f.write(str) #再将内容写入文件
except BaseException as e:
    print(e)
finally:
    f.close()
# with语句 上下文管理器:自动管理上下文资源,不管什么原因跳出with块,都能确保文件正常关闭:
s = ["小可爱\n","hello\n"]
with open(r"my03.txt","a",encoding="UTF-8") as f:
    f.writelines(s) #逐行写入

6. Reading of files

  • read([size]) reads size strings from the file and returns them as the result. If there is no size, the entire file is read to the end and an empty string is returned
  • readline() reads a line as the result and returns it. Read to the end of the file, an empty string will be returned
  • readlines() In the text file, each line is stored in the list as a string, and the list is returned
with open("test/my01.txt",'r',encoding='UTF-8') as f:
    for a in f:
        print(a,end=" ")

Note: enumerate() add serial number

a = ['a\n','b\n','c\n']
b = enumerate(a)
print(a)
print(list(b)) #b:(0,‘cao’)。。。

# 读取行号
c = [temp.rstrip() + " # " + str(index) for index,temp in enumerate(a)] # .rstrip() 去分隔符\n
# print(c)

with open('caotest/my01.txt','r',encoding='utf-8') as f:
    lines = f.readlines()
    lines = [line.rstrip() + '#' + str(index) + '\n' for index,line in enumerate(lines)]
    print(lines)

7. Reading and writing of binary files

  • file_mode = wb rb ab and the rest are the same

8. Common attributes and methods of file objects

  1. Attributes:
  • name
  • mode
  • closed
  1. Open mode:
  • r
  • w
  • a
  • b
  • +
  1. Common methods for file objects:
  • read([size])

  • readline()

  • readlines()

  • write(str)

  • writelines(str) does not add newlines

  • seek(offset[,whence]) Move the file pointer to a new position, offset represents the position relative to whyce

    • offset:
      • off is positive to move to the end direction; negative to move to the start direction
    • whence:
      • 0 from the beginning of the file
      • 1 Calculate from the current position
      • 2 Calculate from the end of the file
  • tell() returns the current position of the file pointer

  • truncate(size) No matter where it is, only the size bytes before the pointer are left, and the rest are deleted

  • flush() writes the contents of the buffer to the file, but does not close the file

  • close() Write the contents of the buffer to the file, close the file at the same time, and release related resources

with open('caotest/my01.txt','r',encoding='utf-8') as f:
    print('filename:{0}'.format(f.name))
    print(f.tell()) #读取文件指针位置
    print('读取内容:{0}'.format(str(f.readline())))
    print(f.tell())
    f.seek(5) #改变文件的指针位置
    print('读取的内容:{0}'.format(str(f.readline())))

9. Serialize using pickle

In python, everything is an object, essentially a "memory block for storing data",

  • Serialization: Convert the object into a "serial ratio" data form, store it on the hard disk or transfer it to other places over the network
  • Deserialization: the reverse process, the read "serialized data" is converted into an object
  • pickle().dump(obj,file) obj is the object to be serialized, and file refers to the stored file
  • pickle().load(file) Read data from file and deserialize it into an object
# 序列化
with open(r'caotest/my07.txt','wb') as f:
    a1 = 'caoyh'
    a2 = 235
    a3 = [20,30,50]
    a4 = '小可爱'
    pickle.dump(a1,f)
    pickle.dump(a2, f)
    pickle.dump(a3, f)
    pickle.dump(a4,f)
# 反序列化
with open('test/my07.txt','rb') as f:
    for a in (a1,a2,a3,a4):
        a = pickle.load(f)
        print(a)

10. CSV file operation

  • csv is a good delimiter text format, often used for data exchange,
  • The export and import of Excel file and database data is
    different from Excel file, csv file
    • Value has no type, all values ​​are strings
    • Cannot specify styles such as font color
    • Can't specify cell height and width, can't merge cells
    • No multiple worksheets
    • Cannot embed images
      The difference between csv file and excel file
import csv
with open('example-write.csv','r',encoding='utf-8') as f:
    a_csv = csv.reader(f)
    # print(a_csv)
    # print('*'*20)
    # print(list(a_csv))
    # print('*' * 20)
    for row in a_csv:
        print(row)

# csv文件写入
with open('example.csv','w') as f:
    b_csv = csv.writer(f)
    b_csv.writerow(["005","bb","18","1000"])

11.os and os.path modules

  • os.system() directly calls the system command os.system("notepad.exe")
  • os.system() calls the ping command of the window system os.system("ping www.baidu.com")

os module

Insert picture description here

import os
os.system('cmd')
# 直接打开应用
os.startfile(r'C:\Program Files (x86)\Sangfor\SSL\SangforCSClient.exe')

'''
获取文件和文件夹的相关信息
'''
print(os.name) # 返回操作系统的名字   window-->nt linux/unix-->posix
print(os.sep)  # 返回操作系统的分隔符 window-->\ linux unix-->/
print(repr(os.linesep))  # window-->\r\n linux unix-->\n\

print(os.stat('my01.txt')) # 获取文件信息

'''
创建目录,创建多级目录,删除目录
'''
#返回当前工作目录
print(os.getcwd())
#创建子目录
# os.mkdir("bookcaoyh")
#先指定目录,再创建子目录
os.chdir("D:/")
os.mkdir("caoyh")

'''
创建目录,创建多级目录,删除目录
'''
os.mkdir("小可爱") #创建目录
os.rmdir("cao") #删除目录
os.makedirs("cao/y/h") #创建多级目录
os.removedirs("cao/y/h") #删除多及目录 只能是空的才可以

os.makedirs("../cao/y") #../ 指的是上一级目录

os.rename("小可爱","cao") # 修改目录名字

dirs = os.listdir("caoyh") #列出一级子目录和子文件
print(dirs)

os.path module

Insert picture description here

import os.path

#### 指的是相对路径
############判断:绝对路径、是否目录、是否文件、文件是否存在###################
print(os.path.isabs("d:/onedrive")) # True
print(os.path.isdir("d:/onedrive")) # True
print(os.path.isfile("d:/a.txt"))   # False
print(os.path.exists("d:/onedrive"))# True

############## 获取文件的基本信息 #############################
print(os.path.getsize("my01.txt")) #获得文件大小
print(os.path.abspath("my01.txt")) #获得文件的绝对路径
print(os.path.dirname("my01.txt")) #获得文件的相对路径

################ 路径的操作 ########################
path = os.path.abspath("my01.txt")
path2list = os.path.split(path)
print(path2list)

print(os.path.splitext(path))

print(os.path.join('caoyh','join'))

'''
练习指定目录下的所有.py文件,并输出文件名
'''
import os
path = os.getcwd()
file_list = os.listdir(path)
for filename in file_list:
    if filename.endswith("py"):
        print(filename,end='\t')

print('\n'+"*"*20)

file_list2 = [filename for filename in os.listdir(path) if filename.endswith(".py")]
for f in file_list2:
    print(f,end='\t')

os.walk() traverse all files recursively

Insert picture description here

'''
测试os.walk()递归遍历所有的子目录和文件夹
'''
path = os.getcwd()#返回绝对路径
print(path+'\n')
list_files = os.walk(path)

for dirpaths,dirnames,filenames in list_files:
    for dir in dirnames:
        print(os.path.join(dirpaths,dir))
    # print('*'*20)
    for file in filenames:
        print(os.path.join(dirpaths,file))

12.Shutil module (copy and compression)

Insert picture description here

## shutil : 拷贝和压缩

import shutil

# 拷贝文件
shutil.copyfile('1.txt','1_copy.txt')

# 拷贝文件夹 以及其中的内容 只能拷贝一次,多次则报错
shutil.copytree('movie','example')

# 压缩 解压缩
shutil.make_archive('example/haha','zip','movie') # 将movie文件夹下的文件,以zip格式压缩至example文件夹下命名为haha

import zipfile
# z1 = zipfile.ZipFile('z1.zip','w')
# z1.write('1.txt')
# z1.write('1_copy.txt')
# z1.close()

z2 = zipfile.ZipFile('z1.zip','r')
z2.extractall('z2')
z2.close()

Guess you like

Origin blog.csdn.net/qq_44783177/article/details/108059388