Alex Poca :
A Pandas' Series can contain invalid values:
a b c d e f g
1 "" "a3" np.nan "\n" "6" " "
df = pd.DataFrame([{"a":1, "b":"", "c":"a3", "d":np.nan, "e":"\n", "f":"6", "g":" "}])
row = df.iloc[0]
I want to produce a clean Series keeping only the columns that contain a numeric value or a non-empty non-space-only alphanumeric string:
b
should be dropped because it is an empty string;d
becausenp.nan
;e
andg
because space-only strings.
The expected result:
a c f
1 "a3" "6"
How can I filter the columns that contain numeric or valid alphanumeric?
row.str.isalnum()
returnsNaN
fora
, instead of the True I would expect.row.astype(str).str.isalnum()
changesd
'snp.nan
to string"nan"
and later considers it a valid string.row.dropna()
of course drops onlyd
(np.nan
).
I don't see so many other possibilities listed at https://pandas.pydata.org/pandas-docs/stable/reference/series.html
As a workaround I can loop on the items() checking type and content, and create a new Series from the values I want to keep, but this approach is inefficient (and ugly):
for index, value in row.items():
print (index, value, type(value))
# a 1 <class 'numpy.int64'>
# b <class 'str'>
# c a3 <class 'str'>
# d nan <class 'numpy.float64'>
# e
# <class 'str'>
# f 6 <class 'str'>
# g <class 'str'>
Is there any boolean filter that can help me to single out the good columns?
jezrael :
Convert values to strings and chain another mask by Series.notna
with bitwise AND
- &
:
row = row[row.astype(str).str.isalnum() & row.notna()]
print (row)
a 1
c a3
f 6
Name: 0, dtype: object