Python uses pandas to import csv file content

Import csv file content using pandas

1. Import by default

The method used to import .csv files in Python is read_csv().

When using read_csv() to import, just specify the file name

import pandas as pd
df = pd.read_csv(r'G:\test.csv')
print(df)

20221222165053

2. Specify the delimiter

The data in the default file of read_csv() is separated by commas, but some files are not separated by commas. At this time, you need to manually specify the separator, otherwise an error will be reported.

The separator is specified by the sep parameter. In addition to commas, common separators include spaces and tabs (\t), etc.

import pandas as pd
df = pd.read_csv(r'G:\test.csv', sep=',')
print(df)

3. Specify the number of lines to read

Assuming that there is a file of hundreds of megabytes now, and you want to know what data is in this file, then you don’t need to import all the data at this time, you only need to see the first few lines, so just set the nrows parameter. Can.

import pandas as pd
df = pd.read_csv(r'G:\test.csv', sep=',', nrows=2)
print(df)

20221222165130

4. Specify the encoding format

Each file has an encoding format, commonly used encoding formats are utf-8 and gbk. Sometimes two files look the same, they have the same file name and the same format, but if they have different encoding formats, they are also different files. For example, when you save an Excel file as Save, there will be two option, although both are .csv files, but these two formats represent two different files

20221222163325

The two encoding formats that Python uses more are UTF-8 and gbk, and the default encoding format is UTF-8. We need to set according to the encoding format of the imported file itself, and set the imported encoding format by setting the parameter encoding.

If we do not specify the encoding parameter, the default is to use the utf-8 encoding format.

import pandas as pd
df = pd.read_csv(r'G:\test.csv', sep=',', nrows=3, encoding='utf-8')
print(df)

If it is a file in CSV (comma-separated) (*.csv) format, then you need to change the encoding format to gbk when importing. If you use UTF-8, an error will be reported.

5. Align column headers with data

Because there are Chinese in our table, the characters occupied by Chinese are different from those occupied by English and numbers, so we need to call pd.set_option() to make the table aligned and displayed. If you are using Jupyter to run the code, Jupyter will automatically render a table, you don't need this setting.

import pandas as pd
#处理数据的列标题与数据无法对齐的情况
pd.set_option('display.unicode.ambiguous_as_wide', True)
#无法对齐主要是因为列标题是中文
pd.set_option('display.unicode.east_asian_width', True)
df = pd.read_csv(r'G:\test.csv', sep=',', nrows=3, encoding='utf-8')
print(df)

The effect after alignment:
20221222164851

Guess you like

Origin blog.csdn.net/hubing_hust/article/details/128410816