Missing values in Python
1. Scenario description
When using the file reading API in Pandas to read files such as Excel, CSV or TXT, we may need to simply clean the read data. For example, replace the string representing a null value with a true missing value
Let’s look at an example:
data = pd.read_table(path,sep)
data.replace(['Null', 'None', 'NaN'], np.NaN, inplace=True)
Null、None、NaN
in the above code are all strings representing null values in the file. We use the replace()
method to replace them with missing values
In fact, the Pandas reading API will automatically parse some Python-recognized strings representing null values as missing valuesnp.NaN
, perhaps the abovereplace()
Methods are redundant
So, what strings can represent missing values in Python?
2. Missing values in Python
Missing values in Python include but are not limited to the following:
import numpy as np
import pandas as pd
print(np.NaN) # NaN
print(type(np.NaN)) # <class 'float'>
print(pd.isnull(np.NaN)) # True
print(pd.isna(np.NaN)) # True
print(np.nan) # NaN
print(type(np.nan)) # <class 'float'>
print(pd.isnull(np.nan)) # True
print(pd.isna(np.nan)) # True
print(pd.NA) # <NA>
print(type(pd.NA)) # <class 'pandas._libs.missing.NAType'>
print(pd.isnull(pd.NA)) # True
print(pd.isna(pd.NA)) # True
# 时间格式的缺失值
print(pd.NaT) # NaT
print(type(pd.NaT)) # <class 'pandas._libs.tslibs.nattype.NaTType'>
print(pd.isnull(pd.NaT)) # True
print(pd.isna(pd.NaT)) # True
print(None) # None
print(type(None)) # <class 'NoneType'>
print(pd.isnull(None)) # True
print(pd.isna(None)) # True
# 空字符串不是缺失值
print('') #
print(type('')) # <class 'str'>
print(pd.isnull('')) # False
print(pd.isna('')) # False
It has been verified that the Python missing value strings that Pandas can automatically recognize include: None、NA、nan、NaN、null、NULL、N/A、<NA>、''
, etc. The missing value strings that cannot be automatically recognized include: na、Na、none、Null
Wait
Therefore, the None
and NaN
strings in the scenario description do not need to be replaced with replace()
, but do not need to be replaced. and , and all uppercase and lowercase Null
needs to be replaced with replace()
NULL
null
If you are interested, please go and try it.