The data packets pandas groupby

Similarly excel PivotTable, generally grouped by row, using the following method.

df.groupby(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True,
        squeeze=False, observed=False, **kwargs)

It is a direct result of a packet obtained DataFrameGroupBy object.

df = pd.DataFrame({'A':['zhao','li','wang','li','zhao'],
                   'B':['one','one','two','three','two'],
                   'C': np.arange (1.6),
                   'D':np.arange(6,11)})
print(df)
print(df.groupby('A'))
print(type(df.groupby('A')))
#       A      B  C   D
# 0  zhao    one  1   6
# 1    li    one  2   7
# 2  wang    two  3   8
# 3    li  three  4   9
# 4  zhao    two  5  10
# <pandas.core.groupby.generic.DataFrameGroupBy object at 0x0000000001E6C550>
# <class 'pandas.core.groupby.generic.DataFrameGroupBy'>

 

It is a direct result of an iterative packet object iterables Each element is a tuple of the name of a packet tuple value, the second value DataFrame. Or may be converted to a for list, each element of the tuple view.

for n,p in df.groupby('A'):
    print(type(p))
    print(n)
    print(p)
    print('-------------------------')
# <class 'pandas.core.frame.DataFrame'>
# li
#     A      B  C  D
# 1  li    one  2  7
# 3  li  three  4  9
# -------------------------
# <class 'pandas.core.frame.DataFrame'>
# wang
#       A    B  C  D
# 2  wang  two  3  8
# -------------------------
# <class 'pandas.core.frame.DataFrame'>
# zhao
#       A    B  C   D
# 0  zhao  one  1   6
# 4  zhao  two  5  10
# -------------------------
View results Packet

 

Obtaining the content of a packet by get_group ( 'Group name')

groups is a dictionary, the dictionary key for the group name, row grouping contains the index value of the list is located, it may be [ 'Group name'] view of a row where a packet groups by

Print (df.groupby ( ' A ' ) .get_group ( ' zhao ' ))   # acquired grouped zhao group 
#        ABCD 
# 0. 1. 6 One zhao 
# . 4 10. 5 TWO zhao 

Print (df.groupby ([ ' A ' , ' B ' ]). Groups)
 Print (df.groupby ([ ' A ' , ' B ' ]). Groups [( ' Li ' , ' One ' )])
 # {('li', 'one'): Int64Index([1], dtype='int64'), ('li', 'three'): Int64Index([3], dtype='int64'), ('wang', 'two'): Int64Index([2], dtype='int64'), ('zhao', 'one'): Int64Index([0], dtype='int64'), ('zhao', 'two'): Int64Index([4], dtype='int64')}
# Int64Index([1], dtype='int64')
get_group view packet contents and groups view grouping row

 

The length of each packet size statistics

print(df.groupby('A').size())
print(type(df.groupby('A').size()))
# A
# li      2
# wang    1
# zhao    2
# dtype: int64
# <class 'pandas.core.series.Series'>
The length of the packet size statistics

 

Packets may be for single or multiple columns, if more than one column group, you need to write in a list.

df = pd.DataFrame({'A':['zhao','li','wang','li','zhao'],
                   'B':['one','one','two','three','two'],
                   'C': np.arange (1.6),
                    ' D ' : np.arange (6,11 )})
 Print (df.groupby ( ' A ' ) .sum ())    # of A column packet, summed, ignoring other non-numeric values for the elements numeric string element 
Print ( ' --------------------- ' )
 Print (df.groupby ([ ' a ' , ' B ' ]). SUM () )    # of a grouping columns a and B, on the other column summation, ignoring non-numeric column element 
Print ( ' --------------------- ' )
 Print ( df.groupby ( ' A ' ) [ ' D' ] .Sum ())    # in group A column, the D column summation and then 
      the CD
 # A           
# Li. 6 16 
# Wang. 8. 3 
# Zhao. 6 16 
# --------------- ------ 
#              the CD 
# AB            
# Li 2. 7 One 
#       Three. 4. 9 
# Wang. 8. 3 TWO 
# Zhao One. 1. 6 
#       TWO. 5 10 
# ---------------- ----- 
# A 
# Li 16 
# Wang. 8 
# Zhao 16 
# Name: D, dtype: int32
groupby single and multi-column grouping

 

Guess you like

Origin www.cnblogs.com/Forever77/p/11288682.html