Missing values in Python




1. Scenario description


When using the file reading API in Pandas to read files such as Excel, CSV or TXT, we may need to simply clean the read data. For example, replace the string representing a null value with a true missing value

Let’s look at an example:

data = pd.read_table(path,sep)
data.replace(['Null', 'None', 'NaN'], np.NaN, inplace=True)

Null、None、NaN in the above code are all strings representing null values ​​in the file. We use the replace() method to replace them with missing values

In fact, the Pandas reading API will automatically parse some Python-recognized strings representing null values ​​as missing valuesnp.NaN, perhaps the abovereplace()Methods are redundant

So, what strings can represent missing values ​​in Python?

2. Missing values ​​in Python


Missing values ​​in Python include but are not limited to the following:

import numpy as np
import pandas as pd

print(np.NaN)             # NaN
print(type(np.NaN))       # <class 'float'>
print(pd.isnull(np.NaN))  # True
print(pd.isna(np.NaN))    # True

print(np.nan)             # NaN
print(type(np.nan))       # <class 'float'>
print(pd.isnull(np.nan))  # True
print(pd.isna(np.nan))    # True

print(pd.NA)              # <NA>
print(type(pd.NA))        # <class 'pandas._libs.missing.NAType'>
print(pd.isnull(pd.NA))   # True
print(pd.isna(pd.NA))     # True

# 时间格式的缺失值
print(pd.NaT)             # NaT
print(type(pd.NaT))       # <class 'pandas._libs.tslibs.nattype.NaTType'>
print(pd.isnull(pd.NaT))  # True
print(pd.isna(pd.NaT))    # True

print(None)               # None
print(type(None))         # <class 'NoneType'>
print(pd.isnull(None))    # True
print(pd.isna(None))      # True

# 空字符串不是缺失值
print('')                 #
print(type(''))           # <class 'str'>
print(pd.isnull(''))      # False
print(pd.isna(''))        # False

It has been verified that the Python missing value strings that Pandas can automatically recognize include: None、NA、nan、NaN、null、NULL、N/A、<NA>、'', etc. The missing value strings that cannot be automatically recognized include: na、Na、none、Null Wait

Therefore, the None and NaN strings in the scenario description do not need to be replaced with replace(), but do not need to be replaced. and , and all uppercase and lowercase Null needs to be replaced with replace()NULLnull

If you are interested, please go and try it.


おすすめ

転載: blog.csdn.net/weixin_55629186/article/details/134783987