First, file
1.1
- Sequence data files are stored on the secondary storage
- File is a form of data storage
- Documents show the form: text files and binary files (essentially all of the files are stored in binary form)
Text File
- File encoded by a single specific component
- Due to encoding, it is also stored as a long string
- It applies to: .txt file, .py files, etc.
binary file
- Directly by a bit of 0 and 1, there is no Unicode
- Usually the presence of 0 and binary 1's organizational structure, i.e., the file format
- It applies to: .png files, .avi files, etc.
#文本形式打开文件
tf=open("f.txt","rt")
print(tf.readline())
tf.close()
#二进制形式打开文件
bf=open("f.txt","rb")
print(bf.readline())
bf.close()
1.2 Open and closed
On the hard disk to store a file typically stores state; if a file to be processed, the first document to make an occupied state becomes
state conversion converted by opening and closing
1.2.1 Open
<变量名>=open(<文件名>,<打开模式>)
- Variable name -> File Handle
- File name -> the path and file name (source files can be saved with directory path)
- Open Mode -> text or binary (read or write)
Open mode:
- 'R': read-only mode, the default value, if the file does not exist return FileNotFoundError
- 'W': cover the write mode, the file does not exist, create, there is the completely covered
- 'X': Create a write mode, the file does not exist is created, there is FileExistsError return
- 'A': additional write mode, the file does not exist, create, there is the additional content in the final document
- 'B': binary pattern
- 'T': the text file mode, the default value
- "+": Used with r / w / x / a, an increase in the original basis function simultaneously read and write functions
1.2.2 Close
<变量名>.close()
变量名->文件句柄
1.3 Contents Read
<f>
.read (size = -1): read the entire contents of which, if the parameter is given, the length of the pre-reading size<f>
.readline (size = -1): reads a single line, if the parameter is given, reads the first length of the line size<f>
.readlines (hint = -1): all lines read the file, the behavior of each element to form a list; if the given parameters, reads the first row hint
#全文本操作
#法一:一次读入,统一处理.对大文件代价很大
fname=input("输入要打开的文件名")
fo=open(fname,"r")
txt=fo.read()
#对全文 txt进行处理
fo.close()
#法二:按数量读入,逐步处理
fname=input("输入要打开的文件名")
fo=open(fname,"r")
txt=fo.read(2)
while txt!="":
#对 txt进行处理
txt = fo.read(2)
fo.close()
#逐行操作
#法一:一次读入,分行处理
fname=input("输入要打开的文件名")
fo=open(fname,"r")
for line in fo.readline():
print(line)
fo.close()
#法二:分行读入,逐行处理
fname=input("输入要打开的文件名")
fo=open(fname,"r")
for line in fo:
print(line)
fo.close()
1.4 File Write
<f>
.write (s): Write a string to a file or byte stream<f>
.writelines (lines): a list of the elements are strings written to the file<f>
.seek (offset): to change the operation of the current file pointer position, offset meanings: 0- beginning of the end position of the current file 1-, 2-file
fo=open("output.txt","w+")
ls=["中国","法国","美国"]
fo.writelines(ls)
fo.seek(0) #如果没有该行代码,则文件操作指针默认在最后,程序不会有任何输出
for line in fo:
print(line)
fo.close()
Second, to handle data formatted
Dimensions 2.1 Data organization
2.1.1 one-dimensional data
Constituted by ordered or disordered peer relationship data, organized linear
correspondence concept lists, arrays, and collections, etc.
2.1.2 two-dimensional data
Is constituted by a plurality of one-dimensional data, a combination of one-dimensional data
table is a typical two-dimensional data (the header part can be regarded as two-dimensional data, may not)
2.1.3 cube
One-dimensional or two-dimensional data on a new dimension in the extended form
such as league tables university extension in the time dimension, different year ranking
2.1.4 high-dimensional data
The relationship between the structure of the complex binary data using only the most basic display
2.1.5 operation period data
Memory <-> represents <-> operation
that is
stored in the format <-> Data type <-> operation
#高维数据 举例:键值对
{
'firstname':'Xinxin',
'lastname':'Wang',
'address':{
'stressAddr':'瀍河',
'city':'洛阳',
'zipcode':'471000'
},
'professional':['Data','Analysis']
}
2.2 one-dimensional data
2.2.1 expressed
If the inter-ordered data, using the list type
if unordered, using a set of types of data between the
further process can be traversed by a for loop data
2.2.2 storage
Separated using a space, comma English, special characters, etc.
Disadvantages: The definition of the characteristics of the data, is less versatile (e.g., when using space-delimited, the stored content can be no spaces)
2.2.3 treatment
And storage format (a list or a set of) switching between the represented embodiment: the program reads the data stored in the data file, is written into the program
#一维数据的处理
#1、从空格分隔的文件中读入数据
#中国 美国 日本 法国
txt=open(fanme).read()
ls=txt.split()
txt.close() #ls为:['中国','美国','日本','法国']
#2、采用特殊分隔方式将数据写入文件
ls=['中国','美国','日本']
f=open(fname,"w")
f.write('$'.join(ls))
f.close() #为:中国$美国$日本
2.3 two-dimensional data
2.3.1 expressed
Use list type
- List type can express two-dimensional data
- Two-dimensional list (each element in the list is representative of two-dimensional data row or a column)
2.3.2 storage
CSV data storage format
- CSV: Comma-Separated Values # values separated by commas
- Internationally accepted a two-dimensional data storage format, general .csv extension
- Each line a one-dimensional data, comma separated, no blank lines
- Excel generally good editing software can read or save as csv file
note
- If one element is missing, still retains a comma
- Two-dimensional data header as data storage, may be separately stored
- General index habits: ls [row] [column], the first column
2.3.3 treatment
CSV file format
#二维数据的处理
#1、从CSV格式的文件中读入数据
fo=open(fname)
ls=[]
for line in fo:
line=line.replace("\n","") #把每行最后的回车替换为空字符串
ls.append(line.split(",")) #把每行的元素用逗号分隔开形成列表
fo.close()
#2、将数据写入CSV格式的文件
ls=[[],[],[]] #二维列表
f=open(fname,"w")
for item in ls:
f.write(','.join(item)+"\n")
f.close()
#数据的逐一处理
ls=[[1,2],[3,4],[5,6]]
for row in ls:
for column in row:
print(column)