Python uses pandas to import the content of excel files in xlsx format

1. Basic import

The way to import .xlsx files using pandas in Python is read_excel().

# coding=utf-8
import pandas as pd

df = pd.read_excel(r'G:\test.xlsx')
print(df)

20221222175733

The file path in the computer uses \ by default. At this time, you need to add an r (escape character) in front of the path to prevent the \ in the path from being escaped. You can also not add r, but you need to convert all \ in the path to /. This rule is the same when importing other format files. We generally choose to add r in front of the path.

2. Align column headers with data

Because there are Chinese in our table, the characters occupied by Chinese are different from those occupied by English and numbers, so we need to call pd.set_option() to make the table aligned and displayed. If you are using Jupyter to run the code, Jupyter will automatically render a table, you don't need this setting.

import pandas as pd
#处理数据的列标题与数据无法对齐的情况
pd.set_option('display.unicode.ambiguous_as_wide', True)
#无法对齐主要是因为列标题是中文
pd.set_option('display.unicode.east_asian_width', True)
df = pd.read_excel(r'G:\test.xlsx')
print(df)

The effect is as follows:
20221222180651

3. Specify to import a sheet

The sheet_name parameter can be used to specify the content of which sheet to import. Note that the names here are case-sensitive.

import pandas as pd
pd.set_option('display.unicode.ambiguous_as_wide', True)
pd.set_option('display.unicode.east_asian_width', True)
df = pd.read_excel(r'G:\test.xlsx', sheet_name='Sheet1')
print(df)

In addition to specifying a specific sheet name, you can also pass in the index subscript of the sheet, counting from 0. For example:

# coding=utf-8
import pandas as pd
pd.set_option('display.unicode.ambiguous_as_wide', True)
pd.set_option('display.unicode.east_asian_width', True)
df = pd.read_excel(r'G:\test.xlsx', sheet_name=0)
print(df)

If the sheet_name parameter is not specified, the content of the first sheet is imported by default.

4. Specify row index

When importing DataFrame from a local file, the row index uses a default index starting from 0, which can be set by setting the index_col parameter.

# coding=utf-8
import pandas as pd
pd.set_option('display.unicode.ambiguous_as_wide', True)
pd.set_option('display.unicode.east_asian_width', True)
df = pd.read_excel(r'G:\test.xlsx', sheet_name=0, index_col=0)
print(df)

20221222180855

5. Specify column index

When importing local files into DataFrame, the first row of the source data table is used as the column index by default, and the column index can also be set by setting the header parameter. The header parameter value is 0 by default, that is, the first row is used as the column index; it can also be other rows, and only the specific row needs to be passed in; or the default number starting from 0 can be used as the column index.

Use the default number starting from 0 as the column index:

# coding=utf-8
import pandas as pd
pd.set_option('display.unicode.ambiguous_as_wide', True)
pd.set_option('display.unicode.east_asian_width', True)
df = pd.read_excel(r'G:\test.xlsx', sheet_name=0, header=None)
print(df)

20221222181152

6. Specify import columns

Sometimes there are too many columns in the local file, and we don't need so many columns, we can specify the columns to be imported by setting the usecols parameter.

20221222181410

From the perspective of the form of the parameter, it can be specified in the following forms:

  • Specified by a list, the list is the subscript of the column, counting from 0.
  • Specified by a list, the list is the name of the column
  • Specified by a tuple, the tuple is the name of the column

Examples are as follows:

df = pd.read_excel(r'G:\test.xlsx', sheet_name=0, usecols=[0,1])
print(df)

20221222182310

df = pd.read_excel(r'G:\test.xlsx', sheet_name=0, usecols=['姓名','性别'])
print(df)

20221222182659

df = pd.read_excel(r'G:\test.xlsx', sheet_name=0, usecols=('姓名','年龄'))
print(df)

20221222182822

7. Specify the number of rows to import

If the file is very large, we don’t want to import all the rows, we only need to import the first few rows for analysis, then we can use the nrows parameter to specify how many rows of data to import

df = pd.read_excel(r'G:\test.xlsx', sheet_name=0, nrows=2)
print(df)

20221222183203

8. More parameters

Please refer to pandas official documentation.

Guess you like

Origin blog.csdn.net/hubing_hust/article/details/128412197