[Problem Solving] The number of empty values in pandas reading excel statistics is wrong

Problem: After pd.read_excel() reads the excel data, it is wrong to use pd.isnull().sum() to count the number of null values. The code is as follows

import pandas as pd

df = pd.read_excel('test.xlsx',dtype=str)
# 剔除空格
df = df.applymap(lambda x : str(x).strip())
print("b列为空的个数为"+ str(pd.isnull(df['b']).sum())) # 打印出来b列中为空的是0个

The excel data looks like this, B2 is a space, and I hope to get a result with a null value of 2
insert image description here
Problem analysis:

  1. The df after reading excel is like this.
    insert image description here
    The df after using applymap to remove spaces is like this
    insert image description here
  2. Read the value of df after excel (df.values)
    insert image description here
    use applymap to remove the value of df after removing spaces (df.values)
    insert image description here

It can be seen that after the applymap method is processed,
the value changes: a space becomes '', and the previous nan becomes 'nan' (here is the string nan, of course it is not empty)

pd.isnull() will only count the number of NaNs , one is the string '', and the other is the string 'nan', so the statistics cannot be
solved: replace all '' and 'nan' with np.nan, isnull will be can be counted

import numpy as np
import pandas as pd

df = pd.read_excel('test.xlsx',dtype=str)
# 剔除空格
df = df.applymap(lambda x : str(x).strip())
df[df == ''] = np.nan
df[df == 'nan'] = np.nan
print("b列为空的个数为"+ str(pd.isnull(df['b']).sum())) # 打印出来b列中为空的是2个

Guess you like

Origin blog.csdn.net/qq_33218097/article/details/129978507