Summary of several methods for pandas to check and fill missing values


Summarize several methods for pandas to check and fill missing values

1. Build sample data

import pandas as pd
import numpy as np
data = {
    
    "ID":[202001, 202002, 202003, 202004, 202005, 202006, 202007, 202008, 202009, 202010],
       "Chinese":[98, 67, 84, 88, 78, 90, 93, np.nan, 82, 87],
       "Math":[92, 80, 73, np.nan, 88, 78, 90, 82, 77, 69],
       "English":[88, 79, 90, 73, 79, 83, 81, np.nan, 71, np.nan]
       }
df = pd.DataFrame(data)
df

insert image description here

Two, n ways to check for missing values

2.1 Two methods to confirm whether there are missing values

df.isnull().values.any()

True

df.isnull().sum().any()

True

2.2 View the missing number and missing rate

df.isnull().sum()

insert image description here

all_data_na = (df.isnull().sum()/len(df))*100
all_data_na = all_data_na.drop(all_data_na[all_data_na == 0].index).sort_values(ascending=False)
missing_data = pd.DataFrame({
    
    '缺失率' : all_data_na})
missing_data

insert image description here

2.3 View the number of non-missing values

df.info()

insert image description here

df.shape[0] - df.isnull().sum()

insert image description here

df.notnull().sum()

insert image description here

3. Three examples of missing value filling

# 用上下平均值填充English
df['English'] = df['English'].fillna(df['English'].interpolate())
df.head(10)

insert image description here

# 用中位数填充value列:
df['Math'] = df['Math'].fillna(df['Math'].median())
df.head(10)

insert image description here

# 用-1填充Chinese列:
df['Chinese'] = df['Chinese'].fillna(-1)
df.head(10)

insert image description here

Guess you like

Origin blog.csdn.net/zzpl139/article/details/128613459