python read text file data

 

The main points of this article are just about:

(1) Read data functions in text file format: read_csv, read_table

    1. Read text files with different delimiters, use the parameter sep

    2. Read a text file without field names (headers), use the parameter names

    3. To index text files, use index_col

    4. Skip lines to read text files, use skiprows

    5. When the data is too large, you need to read the text data block by block and divide it into blocks with chunksize.

(2) Write the data into a text file format function: to_csv

 

 

Examples are as follows:

(1) Read the dataset in text file format

1. The difference between read_csv and read_table:

 

#read_csv reads files with comma separators by default, no need to use sep to specify separators
import pandas as pd

pd.read_csv('C:\\Users\\xiaoxiaodexiao\\pythonlianxi\\test0424\\data.csv')

 

#read_csv If you are reading a file with a non-comma separator, you must use sep to specify the separator, otherwise the original file will be read, and the data will not be split.
import pandas as pd
pd.read_csv('C:\\Users\\xiaoxiaodexiao\\pythonlianxi\\test0424\\data.txt')

  

# Compare the difference with the above example
import pandas as pd
pd.read_csv('C:\\Users\\xiaoxiaodexiao\\pythonlianxi\\test0424\\data.txt',sep='|')

  

#read_table must use sep to specify the delimiter when reading the file, otherwise the read data is the original file, not divided.
import pandas as pd
pd.read_table('C:\\Users\\xiaoxiaodexiao\\pythonlianxi\\test0424\\data.csv')

 

#read_table read data must specify a delimiter
import pandas as pd
pd.read_table('C:\\Users\\xiaoxiaodexiao\\pythonlianxi\\test0424\\data.txt',sep='|')

  

2. When reading a text file without specifying headers and names, the default first line is the header

#Use header=None to indicate that the dataset has no header, and the header and index will be filled with Arabic numerals by default
pd.read_table('C:\\Users\\xiaoxiaodexiao\\pythonlianxi\\test0424\\data.txt',sep='|',header=None)

  

#Use names to customize the header 
pd.read_table('C:\\Users\\xiaoxiodexiao\\pythonlianxi\\test0424\\data.txt',sep='|', 
names=['x1','x2' ,'x3','x4','x5'])

 

3. Specify the index with Arabic numerals by default; use index_col to specify a column as the index

names=['x1','x2','x3','x4','x0']
pd.read_table('C:\\Users\\xiaoxiaodexiao\\pythonlianxi\\test0424\\data.txt',sep='|',
                   names=names,index_col='x0')

  

4. The following example is to use skiprows to skip the row corresponding to hello and read other rows of data. Regardless of whether the first row is used as the header, the header is used as the starting number of the 0th row.


Compare the three examples to understand the difference

pd.read_csv('C:\\Users\\xiaoxiaodexiao\\pythonlianxi\\test0424\\data1.txt')

names=['x1','x2','x3','x4','x0']
pd.read_csv('C:\\Users\\xiaoxiaodexiao\\pythonlianxi\\test0424\\data1.txt',names=names,
            skiprows = [0,3,6])

  

pd.read_csv('C:\\Users\\xiaoxiaodexiao\\pythonlianxi\\test0424\\data1.txt',
            skiprows = [0,3,6])

  

pd.read_csv('C:\\Users\\xiaoxiaodexiao\\pythonlianxi\\test0424\\data1.txt',header=None,
            skiprows = [0,3,6])

  

5. Read in blocks, a total of 8 lines of data in data1.txt, divided according to 3 lines in each block, will be read 3 times, the first time is 3 lines, the second time is 3 lines, and the third time is 1 line of data to read .


Note that the difference between the block and skip line reading is that the header is not used as the first line for block reading, which can be understood by comparing the following two examples.

chunker = pd.read_csv('C:\\Users\\xiaoxiaodexiao\\pythonlianxi\\test0424\\data1.txt',chunksize=3)
for m in chunker:   
    print (len (m))
    print m

  

chunker = pd.read_csv('C:\\Users\\xiaoxiaodexiao\\pythonlianxi\\test0424\\data1.txt',header=None,
                      chunksize=3)
for m in chunker:    
    print(len(m)) 
    print m

  

(二)将数据写入文本格式用to_csv


以data.txt为例,注意写出文件时,将索引也写入了

data=pd.read_table('C:\\Users\\xiaoxiaodexiao\\pythonlianxi\\test0424\\data.txt',sep='|')
print data

  

#可以用index=False禁止索引的写入。
data=pd.read_table('C:\\Users\\xiaoxiaodexiao\\pythonlianxi\\test0424\\data.txt',sep='|')
data.to_csv('C:\\Users\\xiaoxiaodexiao\\pythonlianxi\\test0424\\outdata.txt',sep='!',index=False)

  

#可以用columns指定写入的列
data=pd.read_table('C:\\Users\\xiaoxiaodexiao\\pythonlianxi\\test0424\\data.txt',sep='|')
data.to_csv('C:\\Users\\xiaoxiaodexiao\\pythonlianxi\\test0424\\outdata2.txt',sep=',',index=False,
            columns=['a','c','d'])

  

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324935800&siteId=291194637