Pandas: How to remove non-alphanumeric columns in Series

Alex Poca :

A Pandas' Series can contain invalid values:

a     b     c     d      e      f     g 
1    ""   "a3"  np.nan  "\n"   "6"   " "
df = pd.DataFrame([{"a":1, "b":"", "c":"a3", "d":np.nan, "e":"\n", "f":"6", "g":" "}])
row = df.iloc[0]

I want to produce a clean Series keeping only the columns that contain a numeric value or a non-empty non-space-only alphanumeric string:

  • b should be dropped because it is an empty string;
  • d because np.nan;
  • e and g because space-only strings.

The expected result:

a      c     f
1    "a3"   "6"

How can I filter the columns that contain numeric or valid alphanumeric?

  • row.str.isalnum() returns NaN for a, instead of the True I would expect.
  • row.astype(str).str.isalnum() changes d's np.nan to string "nan" and later considers it a valid string.
  • row.dropna() of course drops only d (np.nan).

I don't see so many other possibilities listed at https://pandas.pydata.org/pandas-docs/stable/reference/series.html

As a workaround I can loop on the items() checking type and content, and create a new Series from the values I want to keep, but this approach is inefficient (and ugly):

for index, value in row.items():
    print (index, value, type(value))


# a 1 <class 'numpy.int64'>
# b  <class 'str'>
# c a3 <class 'str'>
# d nan <class 'numpy.float64'>
# e 
#  <class 'str'>
# f 6 <class 'str'>
# g   <class 'str'>

Is there any boolean filter that can help me to single out the good columns?

jezrael :

Convert values to strings and chain another mask by Series.notna with bitwise AND - &:

row = row[row.astype(str).str.isalnum() & row.notna()]
print (row)
a     1
c    a3
f     6
Name: 0, dtype: object

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=4019&siteId=1