Problem: After pd.read_excel() reads the excel data, it is wrong to use pd.isnull().sum() to count the number of null values. The code is as follows
import pandas as pd
df = pd.read_excel('test.xlsx',dtype=str)
# 剔除空格
df = df.applymap(lambda x : str(x).strip())
print("b列为空的个数为"+ str(pd.isnull(df['b']).sum())) # 打印出来b列中为空的是0个
The excel data looks like this, B2 is a space, and I hope to get a result with a null value of 2
Problem analysis:
- The df after reading excel is like this.
The df after using applymap to remove spaces is like this
- Read the value of df after excel (df.values)
use applymap to remove the value of df after removing spaces (df.values)
It can be seen that after the applymap method is processed,
the value changes: a space becomes '', and the previous nan becomes 'nan' (here is the string nan, of course it is not empty)
pd.isnull() will only count the number of NaNs , one is the string '', and the other is the string 'nan', so the statistics cannot be
solved: replace all '' and 'nan' with np.nan, isnull will be can be counted
import numpy as np
import pandas as pd
df = pd.read_excel('test.xlsx',dtype=str)
# 剔除空格
df = df.applymap(lambda x : str(x).strip())
df[df == ''] = np.nan
df[df == 'nan'] = np.nan
print("b列为空的个数为"+ str(pd.isnull(df['b']).sum())) # 打印出来b列中为空的是2个