DataFrame missing value judgment and filling

Missing value judgment

There are generally four types of missing values ​​in Python:
1, None
2, False
3, ''
4, nan

The first three are easy to judge, just use it directly ==, but the fourth one cannot ==be judged directly, this will happen
insert image description here

NaN (not a number) represents an unrepresentable number in mathematical representation. There is generally another expression inf here. The difference between inf and nan is that inf is a floating-point number beyond the range of floating-point representation (its essence is still It is a number, but it is infinite, so it cannot be represented by a floating point number, such as 1/0), while nan generally represents a non-floating point number (such as an irrational number)

In mathematics, inf==inf, and inf == inf+X (X is any floating-point number), and nan != nan, so to judge whether a number is nan in python, you can directly judge whether it is equal to itself.

In this case, we can use np.isnan()the function to judge
insert image description here


Missing value filling

After you understand how to judge missing values, you can fill them. Here is how to fill each column value with the mean value in a DataFrame.
Data:
insert image description here

Python implementation

def fillna_mean(df):
    for cols in list(df.columns[df.isnull().sum() > 0]):
        mean_val = df[cols].mean() # 这里可以更改需要的公式
        df[cols].fillna(mean_val, inplace=True)
        
    return df

fillna_mean(df)

insert image description here

Reference materials:
Python judges whether it is empty NaN
pandas uses the mean value to fill the missing value column with tips sharing

Guess you like

Origin blog.csdn.net/weixin_46599926/article/details/127944796