Similarly excel PivotTable, generally grouped by row, using the following method.
df.groupby(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True,
squeeze=False, observed=False, **kwargs)
It is a direct result of a packet obtained DataFrameGroupBy object.
df = pd.DataFrame({'A':['zhao','li','wang','li','zhao'], 'B':['one','one','two','three','two'], 'C': np.arange (1.6), 'D':np.arange(6,11)}) print(df) print(df.groupby('A')) print(type(df.groupby('A'))) # A B C D # 0 zhao one 1 6 # 1 li one 2 7 # 2 wang two 3 8 # 3 li three 4 9 # 4 zhao two 5 10 # <pandas.core.groupby.generic.DataFrameGroupBy object at 0x0000000001E6C550> # <class 'pandas.core.groupby.generic.DataFrameGroupBy'>
It is a direct result of an iterative packet object iterables Each element is a tuple of the name of a packet tuple value, the second value DataFrame. Or may be converted to a for list, each element of the tuple view.
for n,p in df.groupby('A'): print(type(p)) print(n) print(p) print('-------------------------') # <class 'pandas.core.frame.DataFrame'> # li # A B C D # 1 li one 2 7 # 3 li three 4 9 # ------------------------- # <class 'pandas.core.frame.DataFrame'> # wang # A B C D # 2 wang two 3 8 # ------------------------- # <class 'pandas.core.frame.DataFrame'> # zhao # A B C D # 0 zhao one 1 6 # 4 zhao two 5 10 # -------------------------
Obtaining the content of a packet by get_group ( 'Group name')
groups is a dictionary, the dictionary key for the group name, row grouping contains the index value of the list is located, it may be [ 'Group name'] view of a row where a packet groups by
Print (df.groupby ( ' A ' ) .get_group ( ' zhao ' )) # acquired grouped zhao group # ABCD # 0. 1. 6 One zhao # . 4 10. 5 TWO zhao Print (df.groupby ([ ' A ' , ' B ' ]). Groups) Print (df.groupby ([ ' A ' , ' B ' ]). Groups [( ' Li ' , ' One ' )]) # {('li', 'one'): Int64Index([1], dtype='int64'), ('li', 'three'): Int64Index([3], dtype='int64'), ('wang', 'two'): Int64Index([2], dtype='int64'), ('zhao', 'one'): Int64Index([0], dtype='int64'), ('zhao', 'two'): Int64Index([4], dtype='int64')} # Int64Index([1], dtype='int64')
The length of each packet size statistics
print(df.groupby('A').size()) print(type(df.groupby('A').size())) # A # li 2 # wang 1 # zhao 2 # dtype: int64 # <class 'pandas.core.series.Series'>
Packets may be for single or multiple columns, if more than one column group, you need to write in a list.
df = pd.DataFrame({'A':['zhao','li','wang','li','zhao'], 'B':['one','one','two','three','two'], 'C': np.arange (1.6), ' D ' : np.arange (6,11 )}) Print (df.groupby ( ' A ' ) .sum ()) # of A column packet, summed, ignoring other non-numeric values for the elements numeric string element Print ( ' --------------------- ' ) Print (df.groupby ([ ' a ' , ' B ' ]). SUM () ) # of a grouping columns a and B, on the other column summation, ignoring non-numeric column element Print ( ' --------------------- ' ) Print ( df.groupby ( ' A ' ) [ ' D' ] .Sum ()) # in group A column, the D column summation and then the CD # A # Li. 6 16 # Wang. 8. 3 # Zhao. 6 16 # --------------- ------ # the CD # AB # Li 2. 7 One # Three. 4. 9 # Wang. 8. 3 TWO # Zhao One. 1. 6 # TWO. 5 10 # ---------------- ----- # A # Li 16 # Wang. 8 # Zhao 16 # Name: D, dtype: int32