Get column names with distinct value greater than specified values python

Shailaja Gupta Kapoor :

Dataframe X:

A   B    C    D
V1  V2   V3   V4
V1  V3   V4   V5
V1  V4   V5   V5
V1  V5   V9   V5
V1  V2   V3   V4
V1  V10  V11  V12
V1  V10  V6   V8
V1  V12  V7   V8

Here Col A has 1 unique value, Col B has 6 unique values, Col C has 7 unique values, Col D has 4 unique values.

I need a list of all columns where unique values > 4 say.

X.columns[(X.nunique() > 4).any()]

I expect to get only col B and Col C here, but I get all columns. How to achieve desired output.

jezrael :

You are really close, only remove .any for boolean mask:

c = X.columns[(X.nunique() > 4)]
print (c)
Index(['B', 'C'], dtype='object')

If need select columns use DataFrame.loc:

df = X.loc[:, (X.nunique() > 4)]
print (df)
     B    C
0   V2   V3
1   V3   V4
2   V4   V5
3   V5   V9
4   V2   V3
5  V10  V11
6  V10   V6
7  V12   V7

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=297199&siteId=1