converters and usecols usage in numpy

Open Excel data with Python. When reading, you need to convert "student number" and "ID" into characters for subsequent operations

df = pd.read_excel(path, converters={
    
    '学号': str, 'ID': str})

insert image description here


The following is my experience to experience:

When I read data into python from Excel, I found that it was reading null values:

import pandas as pd 
df=pd.read_excel("D:/Python/05DataMineML/2022STU(1).xlsx")
df

insert image description here

But there is clearly data, and the reason for the high probability is that there is sheetnamea problem with (the name of the table).

Then try other methods:

The figure below is the header of Excel, with a total of 115 rows of data.
insert image description here
Method 1: Use usecols

#获取字段的第一种写法
import pandas as pd
df=pd.read_excel('../05DataMineML/2022STU(1).xlsx',usecols=['学号','姓名','20220101','20220125','20220202','20220208','20220213','20220220','20220226','20220311','20220320','20220327','20220403','randscore'],index_col='姓名',sheet_name='2022STUMOOC')
df.info()

index_col: specify the index value as the table
usecols: pandas reads excel Use the usecols parameter in read_excel() to read the specified column
sheet_name: table name
insert image description here

Important: To use the usecols parameter, sheet_name must be written explicitly.
insert image description here
Method 2: Use numpy

#获取字段的第二种写法:使用numpy
import pandas as pd
import numpy as np
df=pd.read_excel('../05DataMineML/2022STU(1).xlsx',converters={
    
    '学号':str},usecols=np.arange(3,16),index_col='姓名',sheet_name='2022STU')
df.head()

Converters are involved here:

converters={'学号':str}: Convert the student number to a character type for subsequent operations.
insert image description here
used hereusecols=np.arange(3,16)
insert image description here

Method 3: Use slice interval

#获取字段的第三种写法:切片区间
import pandas as pd
import numpy as np
df=pd.read_excel('../05DataMineML/2022STUMOOC (1).xlsx',converters={
    
    '学号':str},usecols=("D:P"),index_col='姓名',sheet_name='2022STUMOOC')
df

It is used here usecols=("D:P"), that is, the serial number value of each column in the following figure is used for slicing
insert image description here
insert image description here


Summarize:

  • converters usage: conversion type. For example, change a column of Excel data from int to str
  • usecols usage:
    1. usecols=['student number','name']
    1. usecols=np.arange(3,16)
    1. usecols=(“D:P”)

Guess you like

Origin blog.csdn.net/wxfighting/article/details/123953013