Python uses pandas to import the content of excel files in xlsx format
1. Basic import
The way to import .xlsx files using pandas in Python is read_excel().
# coding=utf-8
import pandas as pd
df = pd.read_excel(r'G:\test.xlsx')
print(df)
The file path in the computer uses \ by default. At this time, you need to add an r (escape character) in front of the path to prevent the \ in the path from being escaped. You can also not add r, but you need to convert all \ in the path to /. This rule is the same when importing other format files. We generally choose to add r in front of the path.
2. Align column headers with data
Because there are Chinese in our table, the characters occupied by Chinese are different from those occupied by English and numbers, so we need to call pd.set_option() to make the table aligned and displayed. If you are using Jupyter to run the code, Jupyter will automatically render a table, you don't need this setting.
import pandas as pd
#处理数据的列标题与数据无法对齐的情况
pd.set_option('display.unicode.ambiguous_as_wide', True)
#无法对齐主要是因为列标题是中文
pd.set_option('display.unicode.east_asian_width', True)
df = pd.read_excel(r'G:\test.xlsx')
print(df)
The effect is as follows:
3. Specify to import a sheet
The sheet_name parameter can be used to specify the content of which sheet to import. Note that the names here are case-sensitive.
import pandas as pd
pd.set_option('display.unicode.ambiguous_as_wide', True)
pd.set_option('display.unicode.east_asian_width', True)
df = pd.read_excel(r'G:\test.xlsx', sheet_name='Sheet1')
print(df)
In addition to specifying a specific sheet name, you can also pass in the index subscript of the sheet, counting from 0. For example:
# coding=utf-8
import pandas as pd
pd.set_option('display.unicode.ambiguous_as_wide', True)
pd.set_option('display.unicode.east_asian_width', True)
df = pd.read_excel(r'G:\test.xlsx', sheet_name=0)
print(df)
If the sheet_name parameter is not specified, the content of the first sheet is imported by default.
4. Specify row index
When importing DataFrame from a local file, the row index uses a default index starting from 0, which can be set by setting the index_col parameter.
# coding=utf-8
import pandas as pd
pd.set_option('display.unicode.ambiguous_as_wide', True)
pd.set_option('display.unicode.east_asian_width', True)
df = pd.read_excel(r'G:\test.xlsx', sheet_name=0, index_col=0)
print(df)
5. Specify column index
When importing local files into DataFrame, the first row of the source data table is used as the column index by default, and the column index can also be set by setting the header parameter. The header parameter value is 0 by default, that is, the first row is used as the column index; it can also be other rows, and only the specific row needs to be passed in; or the default number starting from 0 can be used as the column index.
Use the default number starting from 0 as the column index:
# coding=utf-8
import pandas as pd
pd.set_option('display.unicode.ambiguous_as_wide', True)
pd.set_option('display.unicode.east_asian_width', True)
df = pd.read_excel(r'G:\test.xlsx', sheet_name=0, header=None)
print(df)
6. Specify import columns
Sometimes there are too many columns in the local file, and we don't need so many columns, we can specify the columns to be imported by setting the usecols parameter.
From the perspective of the form of the parameter, it can be specified in the following forms:
- Specified by a list, the list is the subscript of the column, counting from 0.
- Specified by a list, the list is the name of the column
- Specified by a tuple, the tuple is the name of the column
Examples are as follows:
df = pd.read_excel(r'G:\test.xlsx', sheet_name=0, usecols=[0,1])
print(df)
df = pd.read_excel(r'G:\test.xlsx', sheet_name=0, usecols=['姓名','性别'])
print(df)
df = pd.read_excel(r'G:\test.xlsx', sheet_name=0, usecols=('姓名','年龄'))
print(df)
7. Specify the number of rows to import
If the file is very large, we don’t want to import all the rows, we only need to import the first few rows for analysis, then we can use the nrows parameter to specify how many rows of data to import
df = pd.read_excel(r'G:\test.xlsx', sheet_name=0, nrows=2)
print(df)
8. More parameters
Please refer to pandas official documentation.