Python - unique () and nunique () function

Reference: https://www.cnblogs.com/xxswkl/p/11009059.html

1 unique()

Different values ​​in the statistics list and returns the array. It has three parameters, statistics were different amounts, are returned array.

When the elements of the list is the list, try not to use this method.

AS NP numpy Import 
A = [1,5,4,2,3,3,5] 
# returns an Array 
Print (np.unique (A)) 
# [2. 3. 4. 5. 1] 

# to return to the first element in the list index of first occurrence 
Print (np.unique (a, return_index = True)) 
# (Array ([. 1, 2,. 3,. 4,. 5]), Array ([0,. 3,. 4, 2,. 1])) 

# return to the original list corresponding to each element in the new list index 
Print (np.unique (a, return_inverse = True)) 
# (Array ([. 1, 2,. 3,. 4,. 5]), Array ([0 , 4, 3, 1, 2, 2, 4])) 

# returns the number of times the element appears in the list 
Print (np.unique (a, return_counts = True)) 
# (Array ([1, 2, 3, 4 ,. 5]), Array ([. 1,. 1, 2,. 1, 2])) 

# parameters when added, UNIQUE () returns a tuple, the tuple properties utilized herein, i.e., the number of elements can be assigned to the corresponding number of variables 
P, Q, m, n-np.unique = (a, return_index = True, return_inverse = True, return_counts = True)  
Print (P, Q, m, n-)
# [2. 3. 4. 5. 1] [2. 4. 3. 1 0] [0. 4 31224] [11212]
 
# Note that when elements of the list, but not the list of the numbers, the output data type the list of elements related to the length 
the number # of using this method to list elements or heavy elements which seek not a good method, error prone 

Different series of statistical values, returns Array, it has no other parameters

import pandas as pd
se = pd.Series([1,3,4,5,2,2,3])
print(se.unique())
# [1 3 4 5 2]

2.nunique ()

The number of different values ​​may be directly dataframe statistics for each column, Series can also be used, but not for List. Returns the number of different values.

df=pd.DataFrame({'A':[0,1,1],'B':[0,5,6]})
print(df)
print(df.nunique())
#    A  B
# 0  0  0
# 1  1  5
# 2  1  6
# A    2
# B    3
# dtype: int64

Use can also be combined with groupby, count the number of different values ​​of each block.

all_user_repay = all_user_repay.groupby(['user_id'])['listing_id'].agg(['nunique']).reset_index()
#    user_id  nunique
# 0       40        1
# 1       56        1
# 2       98        1
# 3      103        1
# 4      122        1

  

Guess you like

Origin www.cnblogs.com/jiaxinwei/p/11982192.html