pandas Learning 10: DataFrame data processing (application, packet aggregation, the window combination, connection, combined)

DataFrame of these operations and the Series is very similar, here briefly.

First, the application

apply () function to the level of an axis, applymap applied to the element level:

DataFrame.apply(self, func, axis=0, raw=False, result_type=None, args=(), **kwds)
DataFrame.applymap(self, func)

Definition of a function fun, the use of apply () function applied to the fun-dimensional array composed of a row DataFrame object, typically a polymeric fun function is a function thereof.

f=lambda x: x.max()-x.min
df.apply(f)

Definition of a function foo, using applymap () function foo function is applied to the individual elements of DataFrame object,

foo=lambda x: '%.2f' % x

df.applymap(foo)

Conversion data, function calls data element processing cycle:

DataFrame.transform(self, func, axis=0, *args, **kwargs)

Second, grouping and aggregate

 groupby a data packet, then the packet aggregation function can be called directly evaluated; AGG () function call to the packet and functions integrated into a polymerizable functions to achieve:

DataFrame.groupby(self, by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, squeeze=False, observed=False, **kwargs)
DataFrame.agg(self, func, axis=0, *args, **kwargs)

Third, the window

Rolling () is in accordance with the rolling evaluation window, Expanding () refers to an ascending order to calculate the cumulated; EWM refers to an exponentially weighted rolling average:

DataFrame.rolling(self, window, min_periods=None, center=False, win_type=None, on=None, axis=0, closed=None)
DataFrame.expanding(self, min_periods=1, center=False, axis=0)
DataFrame.ewm(self, com=None, span=None, halflife=None, alpha=None, min_periods=0, adjust=True, ignore_na=False, axis=0)

Fourth, additional data line

Append data to the end of the line data box

DataFrame.append(self, other, ignore_index=False, verify_integrity=False, sort=None)

Fifth, the natural connection

Connected in the two data blocks on the condition, or as an index, or connected in the same column name, according to the matching equivalence conditions:

DataFrame.join(self, other, on=None, how='left', lsuffix='', rsuffix='', sort=False)

Parameter Notes:

  • on: If set to None, the row index according to match; if the value is set to the column, then matching is performed according to two DataFrame on the column specified
  • how: the type of connection, { 'left', 'right', 'outer', 'inner'}, default 'left'
  • lsuffix: prefix table of the same name left field
  • rsuffix: Right Table Prefix field of the same name

CONSOLIDATED

Connecting operation is similar to the relational database, and functions the same as the join function, according to the matching equivalence conditions, but more flexible than the use of the join function:

DataFrame.merge(self, right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=('_x', '_y'), copy=True, indicator=False, validate=None)

Parameter Notes:

  • right: the right table
  • how:{‘left’, ‘right’, ‘outer’, ‘inner’}, default ‘inner’
  • on: connection conditions, required fields match according to the same name
  • left_on, right_on: specify the order of the left and right tables are connected to a field, the field is significant
  • left_index, right_index: index left and right tables were designated index according to match
  • suffixes: tuples (str, str), prefixes are used to specify the left and right tables of the same name field
  • indicator: the indicator increased, if set to True, increase a "_merge"
  • validate:检查merge的类型(“one_to_one” or “1:1”,“one_to_many” or “1:m”,“many_to_one” or “m:1”和“many_to_many” or “m:m”)

 

 

Reference documents:

pandas DataFrame

Guess you like

Origin www.cnblogs.com/ljhdo/p/11592735.html