How to remove non-ASCII characters and space from column names

Shailaja Gupta Kapoor :

I have a dataframe . Many column names have non ASCII characters and special characters like (), /, +, . (non ascii dots in the middle ) etc and non ascii spaces. This did not happen while reading csv. This happened due to one - hot encoding. (when i converted my categorical variable to numeric columns & category values had non ascii values)

df

Col1/name   Col 2() name    Col3 + name    Col4 ^¨ name   etc...

Expected output

I want only numbers, underscores and characters in my column names (I only want to change column names not any value in dataframe or rows). This is necessary because some Machine learning algorithms such as lightGBM dont work with non ASCII characters or non ASCII spaces in column names.

Expected output df:

Col1name   Col_2_name    Col3__name    Col4__name   etc...

So replacing space with underscores and removing any non- numeric and non-character in column names .

Chris :

One way using pandas.Series.str.replace and findall:

df.columns = ["".join(l) for l in df.columns.str.replace("\s", "_").str.findall("[\w\d]+")]
print(df)

Output:

Empty DataFrame
Columns: [Col1name, Col_2_name, Col3__name, Col4__name]
Index: []

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=27792&siteId=1