Logan Wlv :
I am starting on with panda catagorical dataframes.
Let's say I have (1):
A B C
-------------
3 Z M
O X T
4 A B
I filtered the dataframe like that : df[ df['B'] != "X"]
So I would get as result (2):
A B C
-------------
3 Z M
4 A B
In (1) df['B'].cat.categories #would equal to ['Z', 'X', 'A']
In (2) df['B'].cat.categories #still equal to ['Z', 'X', 'A']
How to update the DF categories of all columns after this kind of filtering operation ?
ALollz :
remove_unused_categories
from the columns after filtering.
As piRSquared points out you can do this succinctly given every column is a categorical dtype:
df = df.query('B != "X"').apply(lambda s: s.cat.remove_unused_categories())
This loops over the columns after filtering.
print(df)
# A B C
#0 3 Z M
#1 O X T
#2 4 A B
df['B'].cat.categories
#Index(['A', 'X', 'Z'], dtype='object')
df = df[ df['B'] != 'X']
# Update all category columns
for col in df.dtypes.loc[lambda x: x == 'category'].index:
df[col] = df[col].cat.remove_unused_categories()
df['B'].cat.categories
#Index(['A', 'Z'], dtype='object')
df['C'].cat.categories
#Index(['B', 'M'], dtype='object')
Guess you like
Origin http://43.154.161.224:23101/article/api/json?id=360912&siteId=1