Open Excel data with Python. When reading, you need to convert "student number" and "ID" into characters for subsequent operations
df = pd.read_excel(path, converters={
'学号': str, 'ID': str})
The following is my experience to experience:
When I read data into python from Excel, I found that it was reading null values:
import pandas as pd
df=pd.read_excel("D:/Python/05DataMineML/2022STU(1).xlsx")
df
But there is clearly data, and the reason for the high probability is that there is sheetname
a problem with (the name of the table).
Then try other methods:
The figure below is the header of Excel, with a total of 115 rows of data.
Method 1: Use usecols
#获取字段的第一种写法
import pandas as pd
df=pd.read_excel('../05DataMineML/2022STU(1).xlsx',usecols=['学号','姓名','20220101','20220125','20220202','20220208','20220213','20220220','20220226','20220311','20220320','20220327','20220403','randscore'],index_col='姓名',sheet_name='2022STUMOOC')
df.info()
index_col: specify the index value as the table
usecols: pandas reads excel Use the usecols parameter in read_excel() to read the specified column
sheet_name: table name
Important: To use the usecols parameter, sheet_name must be written explicitly.
Method 2: Use numpy
#获取字段的第二种写法:使用numpy
import pandas as pd
import numpy as np
df=pd.read_excel('../05DataMineML/2022STU(1).xlsx',converters={
'学号':str},usecols=np.arange(3,16),index_col='姓名',sheet_name='2022STU')
df.head()
Converters are involved here:
converters={'学号':str}
: Convert the student number to a character type for subsequent operations.
used hereusecols=np.arange(3,16)
Method 3: Use slice interval
#获取字段的第三种写法:切片区间
import pandas as pd
import numpy as np
df=pd.read_excel('../05DataMineML/2022STUMOOC (1).xlsx',converters={
'学号':str},usecols=("D:P"),index_col='姓名',sheet_name='2022STUMOOC')
df
It is used here usecols=("D:P")
, that is, the serial number value of each column in the following figure is used for slicing
Summarize:
- converters usage: conversion type. For example, change a column of Excel data from int to str
- usecols usage:
-
- usecols=['student number','name']
-
- usecols=np.arange(3,16)
-
- usecols=(“D:P”)