Reference: https://www.cnblogs.com/xxswkl/p/11009059.html
1 unique()
Different values in the statistics list and returns the array. It has three parameters, statistics were different amounts, are returned array.
When the elements of the list is the list, try not to use this method.
AS NP numpy Import A = [1,5,4,2,3,3,5] # returns an Array Print (np.unique (A)) # [2. 3. 4. 5. 1] # to return to the first element in the list index of first occurrence Print (np.unique (a, return_index = True)) # (Array ([. 1, 2,. 3,. 4,. 5]), Array ([0,. 3,. 4, 2,. 1])) # return to the original list corresponding to each element in the new list index Print (np.unique (a, return_inverse = True)) # (Array ([. 1, 2,. 3,. 4,. 5]), Array ([0 , 4, 3, 1, 2, 2, 4])) # returns the number of times the element appears in the list Print (np.unique (a, return_counts = True)) # (Array ([1, 2, 3, 4 ,. 5]), Array ([. 1,. 1, 2,. 1, 2])) # parameters when added, UNIQUE () returns a tuple, the tuple properties utilized herein, i.e., the number of elements can be assigned to the corresponding number of variables P, Q, m, n-np.unique = (a, return_index = True, return_inverse = True, return_counts = True) Print (P, Q, m, n-) # [2. 3. 4. 5. 1] [2. 4. 3. 1 0] [0. 4 31224] [11212] # Note that when elements of the list, but not the list of the numbers, the output data type the list of elements related to the length the number # of using this method to list elements or heavy elements which seek not a good method, error prone
Different series of statistical values, returns Array, it has no other parameters
import pandas as pd se = pd.Series([1,3,4,5,2,2,3]) print(se.unique()) # [1 3 4 5 2]
2.nunique ()
The number of different values may be directly dataframe statistics for each column, Series can also be used, but not for List. Returns the number of different values.
df=pd.DataFrame({'A':[0,1,1],'B':[0,5,6]}) print(df) print(df.nunique()) # A B # 0 0 0 # 1 1 5 # 2 1 6 # A 2 # B 3 # dtype: int64
Use can also be combined with groupby, count the number of different values of each block.
all_user_repay = all_user_repay.groupby(['user_id'])['listing_id'].agg(['nunique']).reset_index() # user_id nunique # 0 40 1 # 1 56 1 # 2 98 1 # 3 103 1 # 4 122 1