python时间序列分析常用函数

1. pandas.DataFrame.rank

DataFrame.rank(axis=0, method='average', numeric_only=None, na_option='keep', ascending=True, pct=False)

功能：计算沿着轴的数值数据（1到n）。等值的排名是这些值的排名的平均值。返回从小到大排序的下标。

参数：axis : {0 or ‘index’, 1 or ‘columns’}, 默认值0 按照哪个轴进行排序

　　 method : {‘average’, ‘min’, ‘max’, ‘first’} 默认值 average

　　　　　　average ：在相等分组中，为各个值分配平均排名

　　　　　　min ：使用整个分组的最小排名

　　　　　　max ：使用整个分组的最大排名

　　　　　　first ：按值在原始数据中的出现顺序分配排名

　　　 numeric_only : boolean, 默认值 None 仅包含float，int和boolean数据。仅对DataFrame或Panel对象有效

　　　 na_option : {‘keep’, ‘top’, ‘bottom’}

　　　　　　keep：将NA值保留在原来的位置

　　　　　　top ：如果升序，将NA值排名第一

　　　　　　bottom ：如果降序，将NA值排名第一

　　 ascending : boolean, 默认值 True

　　　　　　True 为升序排名 False为降序排名

　　 pct : boolean, 默认值 False

　　　　　　　计算数据的百分比等级

返回：ranks : 与调用者类型相同

2. `DataFrame.corr`(method='pearson', min_periods=1)

Compute pairwise correlation of columns, excluding NA/null values，列的相关系数

Parameters:

Parameters:	method : {‘pearson’, ‘kendall’, ‘spearman’} pearson : standard correlation coefficient kendall : Kendall Tau correlation coefficient spearman : Spearman rank correlation min_periods : int, optional Minimum number of observations required per pair of columns to have a valid result. Currently only available for pearson and spearman correlation
Returns:	y : DataFrame

method : {‘pearson’, ‘kendall’, ‘spearman’}

pearson : standard correlation coefficient

kendall : Kendall Tau correlation coefficient

spearman : Spearman rank correlation

min_periods : int, optional

Minimum number of observations required per pair of columns to have a valid result. Currently only available for pearson and spearman correlation

Returns:

y : DataFrame

3. `DataFrame.corrwith`(other, axis=0, drop=False) 计算DataFrame的列（axis=0，默认）或行（axis=1)跟另外一个Series或DataFrame之间的相关系数：

Compute pairwise correlation between rows or columns of two DataFrame objects.

Parameters:

Parameters:	other : DataFrame axis : {0 or ‘index’, 1 or ‘columns’}, default 0 0 or ‘index’ to compute column-wise, 1 or ‘columns’ for row-wise drop : boolean, default False Drop missing indices from result, default returns union of all
Returns:	correls : Series

other : DataFrame

axis : {0 or ‘index’, 1 or ‘columns’}, default 0

0 or ‘index’ to compute column-wise, 1 or ‘columns’ for row-wise

drop : boolean, default False

Drop missing indices from result, default returns union of all

Returns:

correls : Series

4. numpy 中的 maximum~fmax，max

np.max：(a, axis=None, out=None, keepdims=False)
- 求序列的最值
- 最少接收一个参数
- axis：默认为列向（也即 axis=0），axis = 1 时为行方向的最值；
np.maximum：(X, Y, out=None)
- X 与 Y 逐位比较取其大者；
- 最少接收两个参数

maximum, fmax返回两个数组中较大值组成的数组。fmax忽略NaN。

5. DataFrame.shift(periods=1, freq=None, axis=0)

periods：类型为int，表示移动的幅度，可以是正数，也可以是负数，默认值是1,1就表示移动一次，注意这里移动的都是数据，而索引是不移动的，移动之后没有对应值的，就赋值为NaN。如：

index	value1
A	0
B	1
C	2
D	3

那么如果执行以下代码：

df.shift()

就会变成如下：

index	value1
A	NaN
B	0
C	1
D	2

执行：

df.shift(-1)

会得到：

index	value1
A	1
B	2
C	3
D	NaN

freq： DateOffset, timedelta, or time rule string，可选参数，默认值为None，只适用于时间序列，如果这个参数存在，那么会按照参数值移动时间索引，而数据值没有发生变化。例如现在有df1如下：

index	value1
2016-06-01	0
2016-06-02	1
2016-06-03	2
2016-06-04	3

执行：

df1.shift(periods=1,freq=datetime.timedelta(1))

会得到：

index | value1
—-|—-
2016-06-02 | 0
2016-06-03 | 1
2016-06-04 | 2
2016-06-05 | 3

- axis：{0, 1, ‘index’, ‘columns’}，表示移动的方向，如果是0或者’index’表示上下移动，如果是1或者’columns’，则会左右移动。

6. python | pandas | 移动窗口函数rolling

pandas.rolling_count(arg, window, freq=None, center=False, how=None)

arg : DataFrame 或 numpy的ndarray 数组格式
window : 指移动窗口的大小，为整数
freq :
center : 布尔型，默认为False, 指取中间的
how : 字符串，默认为“mean”,为down- 或re-sampling

pandas.rolling_sum(arg, window, min_periods=None, freq=None, center=False, how=None, **kwargs)

pandas.rolling_mean(arg, window, min_periods=None, freq=None, center=False, how=None, **kwargs)

pandas.rolling_median(arg, window, min_periods=None, freq=None, center=False, how='median', **kwargs)

pandas.rolling_var(arg, window, min_periods=None, freq=None, center=False, how=None, **kwargs)

pandas.rolling_std(arg, window, min_periods=None, freq=None, center=False, how=None, **kwargs)

pandas.rolling_min(arg, window, min_periods=None, freq=None, center=False, how='min', **kwargs)

pandas.rolling_corr(arg1, arg2=None, window=None, min_periods=None, freq=None, center=False, pairwise=None, how=None)

pandas.rolling_cov(arg1, arg2=None, window=None, min_periods=None, freq=None, center=False, pairwise=None, how=None, ddof=1)