Python-- file and data formats

First, file

1.1

  • Sequence data files are stored on the secondary storage
  • File is a form of data storage
  • Documents show the form: text files and binary files (essentially all of the files are stored in binary form)

Text File

  • File encoded by a single specific component
  • Due to encoding, it is also stored as a long string
  • It applies to: .txt file, .py files, etc.

binary file

  • Directly by a bit of 0 and 1, there is no Unicode
  • Usually the presence of 0 and binary 1's organizational structure, i.e., the file format
  • It applies to: .png files, .avi files, etc.
#文本形式打开文件
tf=open("f.txt","rt")
print(tf.readline())
tf.close()
#二进制形式打开文件
bf=open("f.txt","rb")
print(bf.readline())
bf.close()

1.2 Open and closed

On the hard disk to store a file typically stores state; if a file to be processed, the first document to make an occupied state becomes
state conversion converted by opening and closing

1.2.1 Open

<变量名>=open(<文件名>,<打开模式>)

  • Variable name -> File Handle
  • File name -> the path and file name (source files can be saved with directory path)
  • Open Mode -> text or binary (read or write)

Open mode:

  • 'R': read-only mode, the default value, if the file does not exist return FileNotFoundError
  • 'W': cover the write mode, the file does not exist, create, there is the completely covered
  • 'X': Create a write mode, the file does not exist is created, there is FileExistsError return
  • 'A': additional write mode, the file does not exist, create, there is the additional content in the final document
  • 'B': binary pattern
  • 'T': the text file mode, the default value
  • "+": Used with r / w / x / a, an increase in the original basis function simultaneously read and write functions

1.2.2 Close

<变量名>.close()
    变量名->文件句柄

1.3 Contents Read

  • <f>.read (size = -1): read the entire contents of which, if the parameter is given, the length of the pre-reading size
  • <f>.readline (size = -1): reads a single line, if the parameter is given, reads the first length of the line size
  • <f>.readlines (hint = -1): all lines read the file, the behavior of each element to form a list; if the given parameters, reads the first row hint
#全文本操作
 #法一:一次读入,统一处理.对大文件代价很大
fname=input("输入要打开的文件名")
fo=open(fname,"r")
txt=fo.read()
    #对全文 txt进行处理
fo.close()
 #法二:按数量读入,逐步处理
fname=input("输入要打开的文件名")
fo=open(fname,"r")
txt=fo.read(2)
while txt!="":
    #对 txt进行处理
    txt = fo.read(2)
fo.close()

#逐行操作
 #法一:一次读入,分行处理
fname=input("输入要打开的文件名")
fo=open(fname,"r")
for line in fo.readline():
    print(line)
fo.close()
 #法二:分行读入,逐行处理
fname=input("输入要打开的文件名")
fo=open(fname,"r")
for line in fo:
    print(line)
fo.close()

1.4 File Write

  • <f>.write (s): Write a string to a file or byte stream
  • <f>.writelines (lines): a list of the elements are strings written to the file
  • <f>.seek (offset): to change the operation of the current file pointer position, offset meanings: 0- beginning of the end position of the current file 1-, 2-file
fo=open("output.txt","w+")
ls=["中国","法国","美国"]
fo.writelines(ls)
fo.seek(0) #如果没有该行代码,则文件操作指针默认在最后,程序不会有任何输出
for line in fo:
    print(line)
fo.close()

Second, to handle data formatted

Dimensions 2.1 Data organization

2.1.1 one-dimensional data

Constituted by ordered or disordered peer relationship data, organized linear
correspondence concept lists, arrays, and collections, etc.

2.1.2 two-dimensional data

Is constituted by a plurality of one-dimensional data, a combination of one-dimensional data
table is a typical two-dimensional data (the header part can be regarded as two-dimensional data, may not)

2.1.3 cube

One-dimensional or two-dimensional data on a new dimension in the extended form
such as league tables university extension in the time dimension, different year ranking

2.1.4 high-dimensional data

The relationship between the structure of the complex binary data using only the most basic display

2.1.5 operation period data

Memory <-> represents <-> operation
that is
stored in the format <-> Data type <-> operation

#高维数据 举例:键值对
{
    'firstname':'Xinxin',
    'lastname':'Wang',
    'address':{
                'stressAddr':'瀍河',
                'city':'洛阳',
                'zipcode':'471000'
                },
    'professional':['Data','Analysis']
}

2.2 one-dimensional data

2.2.1 expressed

If the inter-ordered data, using the list type
if unordered, using a set of types of data between the
further process can be traversed by a for loop data

2.2.2 storage

Separated using a space, comma English, special characters, etc.
Disadvantages: The definition of the characteristics of the data, is less versatile (e.g., when using space-delimited, the stored content can be no spaces)

2.2.3 treatment

And storage format (a list or a set of) switching between the represented embodiment: the program reads the data stored in the data file, is written into the program

#一维数据的处理
 #1、从空格分隔的文件中读入数据
 #中国 美国 日本 法国
txt=open(fanme).read()
ls=txt.split()
txt.close() #ls为:['中国','美国','日本','法国']
 #2、采用特殊分隔方式将数据写入文件
ls=['中国','美国','日本']
f=open(fname,"w")
f.write('$'.join(ls))
f.close() #为:中国$美国$日本

2.3 two-dimensional data

2.3.1 expressed

Use list type

  • List type can express two-dimensional data
  • Two-dimensional list (each element in the list is representative of two-dimensional data row or a column)

2.3.2 storage

CSV data storage format

  • CSV: Comma-Separated Values ​​# values ​​separated by commas
  • Internationally accepted a two-dimensional data storage format, general .csv extension
  • Each line a one-dimensional data, comma separated, no blank lines
  • Excel generally good editing software can read or save as csv file

note

  • If one element is missing, still retains a comma
  • Two-dimensional data header as data storage, may be separately stored
  • General index habits: ls [row] [column], the first column

2.3.3 treatment

CSV file format

#二维数据的处理
 #1、从CSV格式的文件中读入数据
fo=open(fname)
ls=[]
for line in fo:
    line=line.replace("\n","") #把每行最后的回车替换为空字符串
    ls.append(line.split(",")) #把每行的元素用逗号分隔开形成列表
fo.close()
 #2、将数据写入CSV格式的文件
ls=[[],[],[]] #二维列表
f=open(fname,"w")
for item in ls:
    f.write(','.join(item)+"\n")
f.close()
 #数据的逐一处理
ls=[[1,2],[3,4],[5,6]]
for row in ls:
    for column in row:
        print(column)

Guess you like

Origin www.cnblogs.com/xxwang1018/p/11571595.html