案例分析：股票涨跌预测

目标和背景
采用逻辑回归方法，使用过去 5 天的收益率 X 来预测未来一天的涨跌 Y，
并依据涨跌概率大小来构建多空投资组合。
解决方案和程序

拟合模型：将其中 450 天数据作为训练样本，拟合一个逻辑回归模型，得
到参数估计。用最后 50 天数据作为预测样本，用于检验模型效果。
计算信息系数：检验样本中股票涨跌的预测和实际涨跌的相关系数大约为
0.077，即信息系数，用来度量因子或因子组合的好坏。
构建多空投资组合：等比例持有预测上涨概率最大的 10 支股票，做空上
涨概率最小的 10 支股票，画出组合收益图。
参考代码：
import numpy as np
import numpy.linalg as la
import pandas as pd
import os
import matplotlib.pyplot as plt
index_path = r’data\SZ399300.TXT’

index300 = pd.read_table(index_path,
encoding = ‘cp936’,header = None)
idx = index300[:-1]
idx.columns = [‘date’,‘o’,‘h’,‘l’,‘c’,‘v’,‘to’]
idx.index = idx[‘date’]

stock_path = r’data\hs300’
names = os.listdir(stock_path)
close = []
for name in names:
spath = stock_path + ‘\’ + name
df0 = pd.read_table(spath,
encoding = ‘cp936’,header = None)
df1 = df0[:-1]
df1.columns = [‘date’,‘o’,‘h’,‘l’,‘c’,‘v’,‘to’]
df1.index = df1[‘date’]
df2 = df1.reindex(idx.index,method = ‘ffill’)
df3 = df2.fillna(method = ‘bfill’)
close.append(df3[‘c’].values)

data = np.asarray(close).T

retx = (data[1:,:]-data[:-1,:])/data[:-1,:]

n = 500
n1 = 50
p = 5
train = retx[-n:-n1,:]
ret = train[p:,:].ravel()
X1 = train[4:-1,:].ravel()[:,np.newaxis]
X2 = train[3:-2,:].ravel()[:,np.newaxis]
X3 = train[2:-3,:].ravel()[:,np.newaxis]
X4 = train[1:-4,:].ravel()[:,np.newaxis]
X5 = train[:-5,:].ravel()[:,np.newaxis]
y_train = (ret>0).astype(int)
X_train = np.hstack((X5,X4,X3,X2,X1))

test = retx[-n1:,:]
ret2 = test[p:,:].ravel()
X1 = test[4:-1,:].ravel()[:,np.newaxis]
X2 = test[3:-2,:].ravel()[:,np.newaxis]
X3 = test[2:-3,:].ravel()[:,np.newaxis]
X4 = test[1:-4,:].ravel()[:,np.newaxis]
X5 = test[:-5,:].ravel()[:,np.newaxis]
y_test = (ret2>0).astype(int)
X_test = np.hstack((X5,X4,X3,X2,X1))

from sklearn import linear_model
from sklearn.metrics import classification_report
clf = linear_model.LogisticRegression(C=1e2,fit_intercept=True)
clf.fit(X_train,y_train)
y_pred0 = clf.predict(X_train)
print(classification_report(y_train, y_pred0))
np.corrcoef([y_train,y_pred0])

y_pred = clf.predict(X_test)
from sklearn.metrics import classification_report
print(classification_report(y_test, y_pred))
np.corrcoef([y_test,y_pred]) # Information Coefficient, IC

holding_matrix = np.zeros((n1-p,300))
for j in range(n1-p):
#prob = clf.predict_proba(test[j:j+5,:].T)[:,1]
prob = clf.predict_proba(test[j:j+p,:].T)[:,1]
long_position = prob.argsort()[-10:]
short_position = prob.argsort()[:10]
holding_matrix[j,long_position] = 0.05
holding_matrix[j,short_position] = -0.05

tmp_ret = np.sum(holding_matrix*test[p:],axis = 1)
portfolio_ret = np.append(0,tmp_ret)
plt.plot(np.cumprod(1+portfolio_ret))
plt.legend([‘Performance of LR’],loc=‘upper left’)
plt.savefig(r’fig\stockret-lr’)

plt.plot(np.cumprod(1+portfolio_ret))
plt.plot(np.cumprod(1+portfolio_ret),‘–’)
plt.legend([‘Logistic Regression’,‘neural network’])
plt.savefig(r’fig\stockret-lrnn’)
plt.show()
运行结果：

在这里插入图片描述

案例分析：股票涨跌预测

猜你喜欢