Missing value judgment
There are generally four types of missing values in Python:
1, None
2, False
3, ''
4, nan
The first three are easy to judge, just use it directly ==
, but the fourth one cannot ==
be judged directly, this will happen
NaN (not a number) represents an unrepresentable number in mathematical representation. There is generally another expression inf here. The difference between inf and nan is that inf is a floating-point number beyond the range of floating-point representation (its essence is still It is a number, but it is infinite, so it cannot be represented by a floating point number, such as 1/0), while nan generally represents a non-floating point number (such as an irrational number)
In mathematics, inf==inf, and inf == inf+X (X is any floating-point number), and nan != nan, so to judge whether a number is nan in python, you can directly judge whether it is equal to itself.
In this case, we can use np.isnan()
the function to judge
Missing value filling
After you understand how to judge missing values, you can fill them. Here is how to fill each column value with the mean value in a DataFrame.
Data:
Python implementation
def fillna_mean(df):
for cols in list(df.columns[df.isnull().sum() > 0]):
mean_val = df[cols].mean() # 这里可以更改需要的公式
df[cols].fillna(mean_val, inplace=True)
return df
fillna_mean(df)
Reference materials:
Python judges whether it is empty NaN
pandas uses the mean value to fill the missing value column with tips sharing