思路: 输入投资组合序列–>OLS线性回归–>取残差–>ADF-Test(判断残差序列稳定性)–>否,元组合不具备协整关系
|
V
是,套利
分别从tushare和yahoo获取zte a股和zte港股的数据:
import pandas_datareader.data as web
import datetime
import fix_yahoo_finance as yf
import pandas as pd
import tushare as ts
import statsmodels.api as sm#for ols
yf.pdr_override()
#打开已有文件,拿到最新记录时间最为数据抓取开始时间,注意没有engine ='python'会报错
#data=pd.read_csv(r'C:\Users\yi\Desktop\Study\量化交易\ztehk.csv',sep=',',engine ='python')
#此时dmax为str,需将其转为datetime
#start=pd.to_datetime(data['Date']).max()+datetime.timedelta(days=1)
start='2015-06-01'
end=str(datetime.date.today())
ztehk=web.get_data_yahoo('0763.HK',start,end)
#获取的ztehk为dataframe格式,可以直接存储为CSV
#ztehk.to_csv(r'C:\Users\yi\Desktop\Study\量化交易\ztehk.csv',index=True,sep=',',mode='a', header=False)
#获取的ztesz数据
pro = ts.pro_api()
api=ts.pro_api('416626394789e89bcb3ad6f62ca0c8cb332b49fa2ac9156cb7aa6fc6')
ztesz = ts.pro_bar(pro_api=api, ts_code='000063.SZ', adj='qfq', start_date=start.replace('-',''), end_date=end.replace('-',''))
根据日期排序,截取闭市价维度:
#数据整理,取各自的EOD close
ztesz_eod=ztesz.sort_index().rename(columns={'close':'ztesz_eod'}).loc[:,'ztesz_eod']
ztehk_eod=ztehk.sort_index().rename(columns={'Adj Close':'ztehk_eod'}).loc[:,'ztehk_eod']
yahoo给的数据index是datetime格式,而tushare给的是str格式,需要做格式转换:
#循环获取ztehk_eod.index,转换为与tushare index相同格式的str
import numpy as np
i=0
reind=[]
for i in range(0,len(ztehk_eod.index)):
reind.append(str(ztehk_eod.index[i]).split()[0].replace('-',''))
i+=1
将新的时间列表插入原df,并将其设为index,drop原来的index
ztehk_eod_new=pd.DataFrame({'date_old':ztehk_eod.index,'ztehk_close':ztehk_eod.values,'date':reind}).set_index('date').drop('date_old',axis=1)
为了与ztehk格式保持一致,将ztesz由原来的series转换为df
ztesz_eod_new=pd.DataFrame({'ztesz_close':ztesz_eod})
将两组数据合并为一个新的df
zte=pd.DataFrame()
for t in ztehk_eod_new.index:
if t in ztesz_eod_new.index:
zte_sz_eod=ztesz_eod_new.loc[t,'ztesz_close']
zte_hk_eod=ztehk_eod_new.loc[t,'ztehk_close']
zte=zte.append(pd.DataFrame({'ztesz':[zte_sz_eod],'ztehk':[zte_hk_eod],'date':t}))
#print(ztehk_eod,ztesz_eod)
zte1=zte.set_index('date')
用OLS对两组数据进行线性回归,获取回归方程残差并画出
ols_result = sm.OLS(zte1.iloc[:,0], zte1.iloc[:,1]).fit()
ols_result.resid.plot()
用ADF-Test检验残差的平稳性:
from statsmodels.tsa.stattools import adfuller
def testStationarity(data):
adftest = adfuller(data)
result = pd.Series(adftest[0:4], index=['Test Statistic','p-value','Lags Used','Number of Observations Used'])
for key,value in adftest[4].items():
result['Critical Value (%s)'%key] = value
return result
result=testStationarity(ols_result.resid)
result
import pprint
pprint.pprint(result)
Test Statistic -4.005821
p-value 0.001380
Lags Used 7.000000
Number of Observations Used 756.000000
Critical Value (1%) -3.439029
Critical Value (5%) -2.865371
Critical Value (10%) -2.568810
dtype: float64
test值小于cv-5%,认为残差为平稳序列;
计算两组数据差值,并将其正态化:
#定义差价正态化函数
import matplotlib.pyplot as plt
def zscore(series):
trans=(series - series.mean()) / np.std(series)
return trans
zscore(zte1.iloc[:,0]-zte1.iloc[:,1]).plot(figsize=(20,7))
plt.axhline(zscore(zte1.iloc[:,0]-zte1.iloc[:,1]).mean(), color='black')
plt.axhline(1.0, color='red', linestyle='--')
plt.axhline(-1.0, color='green', linestyle='--')
plt.title('ztesz-ztehk')
可以看到两组股票的eod差值正态化序列在±1箱体波动,投资策略可以设为当序列突破上轨,做空ztesz,做多ztehk, 当序列突破下轨,做多ztesz,做空ztehk;序列趋近中轨撤资离场.