BC Smith :
I am trying to filter data from a dataframe which are less than a certain value. If there is no NaN then its working fine. But when there is a nan then it is ignoring the NaN value. I want to include all the time its doesn't matter its less than or bigger than the comparing value.
import pandas as pd
import numpy as np
df = pd.DataFrame(
{
'index': [1, 2, 3, 4, 5, 6, 7, 8, 9],
'value': [5, 6, 7, np.nan, 9, 3, 11, 34, 78]
}
)
df_chunked = df[(df['index'] >= 1) & (df['index'] <= 5)]
print('df_chunked')
print(df_chunked)
df_result = df_chunked[(df_chunked['value'] < 10)]
# df_result = df_chunked[(df_chunked['value'] < 10) | (df_chunked['value'] == np.isnan(df_chunked['value']))]
print('df_result')
print(df_result)
In the above result 5,6,7,9 is showing. but i want also the nan there. I tried with
df_result = df_chunked[(df_chunked['value'] < 10) | (df_chunked['value'] == np.isnan(df_chunked['value']))]
But it is not working.
How can I do this?
ansev :
Use not operator: ~
df_chunked[~(df_chunked['value'].ge(10))]
#df_chunked[~(df_chunked['value']>=10)] #greater or equal(the same)
index value
0 1 5.0
1 2 6.0
2 3 7.0
3 4 NaN
4 5 9.0
why?
Because the logical operations simply ignore NaN
values and take it as False
, always as you can see in the following data frame, then if you want to avoid using series.isna
( avoid unnecessary additional code) and simplify your code simply use the inverse logic with ~
print(df.assign(greater_than_5 = df['value'].gt(5),
not_greater_than_5 = df['value'].le(5)))
index value greater_than_5 not_greater_than_5
0 1 5.0 False True
1 2 6.0 True False
2 3 7.0 True False
3 4 NaN False False
4 5 9.0 True False
5 6 3.0 False True
6 7 11.0 True False
7 8 34.0 True False
8 9 78.0 True False