For the purpose of data processing is data analysis, the following functions to share common data will be used in the analysis.
A packet, and the polymerization
groupby a data packet, then the packet aggregation function can be called directly evaluated; AGG () function call to the packet and functions integrated into a polymerizable functions to achieve:
DataFrame.groupby(self, by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, squeeze=False, observed=False, **kwargs) DataFrame.agg(self, func, axis=0, *args, **kwargs)
Second, window
Rolling () is in accordance with the rolling evaluation window, Expanding () refers to an ascending order to calculate the cumulated; EWM refers to an exponentially weighted rolling average:
DataFrame.rolling(self, window, min_periods=None, center=False, win_type=None, on=None, axis=0, closed=None) DataFrame.expanding(self, min_periods=1, center=False, axis=0) DataFrame.ewm(self, com=None, span=None, halflife=None, alpha=None, min_periods=0, adjust=True, ignore_na=False, axis=0)
For more information, refer to: PANDAS Learning 4: the sequence of processing (application, polymerization conversion, mapping, packet, rolling, extension, exponential weighted moving average)
Third, relevant
Calculating a correlation between the two pairs of values:
DataFrame.corr(self, method='pearson', min_periods=1)
method: method of calculating the correlation, the effective value is 'pearson', 'kendall', 'spearman' or callable
min_periods: Each column must have a minimum number of valid results observed, currently only available in: Pearson and Spearman correlation.
Fourth, statistical functions
Commonly used statistical functions:
- min, max: minimum, maximum,
- mode: the mode
- var: variance
- std: standard deviation
- sum: cumulative and
- mean: Mean
- mad: the mean absolute
- median: median
- quantile: percentile
- count: count
- cumsum: cumulative sums
- cumprod: Cumulative product
- cummin, cummax: cumulative minimum, maximum cumulative
Reference documents: