Previous article for US stocks data from the correlation analysis made a presentation and demonstration of the code, then we will in the future make a prediction with historical price data for US stocks.
Anticipation of Stock Price
Anticipation of Stock Price
Project features
The use of these three machine learning models to predict stock:
Simple linear analysis,
Quadratic discriminant analysis (QDA)
K Nearest Neighbor (KNN).
First, let's write a few Function: level percentages and percentage change.
dfreg = df.loc[:,[‘Adj Close’,’Volume’]]
dfreg[‘HL_PCT’] = (df[‘High’] — df[‘Low’]) / df[‘Close’] * 100.0
dfreg[‘PCT_change’] = (df[‘Close’] — df[‘Open’]) / df[‘Open’] * 100.0
The final data block generated
Pretreatment and cross-validation
Cross-validation
Before the data into the predictive model, the following steps will be used to clean and process the data:
1. Handling missing values Loss
2. In the label data segmentation, predicted target column is AdjClose
3. Scale X, so that each data meet the linear regression distribution
4. Finally determine the training set X and X for model generation and evaluation
5. The result is set to be predicted and the tag identified as y
6. Training isolated by cross-validation test and the test model train separation
Well, step finished. code show as below:
Drop missing value
dfreg.fillna(value=-99999, inplace=True)
We want to separate 1 percent of the data to forecast
forecast_out = int(math.ceil(0.01 * len(dfreg)))
Separating the label here, we want to predict the AdjClose
forecast_col = 'Adj Close'
dfreg['label'] = dfreg[forecast_col].shift(-forecast_out)
X = np.array(dfreg.drop(['label'], 1))
Scale the X so that everyone can have the same distribution for linear regression
X = preprocessing.scale(X)
Finally We want to find Data Series of late X and early X (train) for model generation and evaluation
X_lately = X[-forecast_out:]
X = X[:-forecast_out]
Separate label and identify it as y
y = np.array(dfreg['label'])
y = y[:-forecast_out]
模型生成 - 愉快的预测开始了
首先,导入Scikit-Learn:
from sklearn.linear_model import LinearRegression
from sklearn.neighbors import KNeighborsRegressor
from sklearn.linear_model import Ridge
from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import make_pipeline
简单线性分析和二次判别分析
简单线性分析显示两个或多个变量之间的线性关系。当我们在两个变量中绘制这种关系时,我们得到一条直线。二次判别分析与简单线性分析类似,只是模型允许多项式(例如:x平方)并产生曲线。
线性回归预测因变量(y)作为输出给出独立变量(x)作为输入。在绘图期间,这将给我们一条直线,如下所示:
简单线性回归
下面通过使用Scikit-Learn库来训练模型。代码如下。
Linear regression
clfreg = LinearRegression(n_jobs=-1)
clfreg.fit(X_train, y_train)
Quadratic Regression 2
clfpoly2 = make_pipeline(PolynomialFeatures(2), Ridge())
clfpoly2.fit(X_train, y_train)
Quadratic Regression 3
clfpoly3 = make_pipeline(PolynomialFeatures(3), Ridge())
clfpoly3.fit(X_train, y_train)
K近邻(KNN)
该KNN使用特征相似性来预测数据点的值。这可确保分配的新点与数据集中的点类似。为了找出相似性,将选取点以得到最小距离(例如:欧几里德距离)。
KNN模型可视化,您可以将被质疑的元素分组为k个元素
KNN Regression
clfknn = KNeighborsRegressor(n_neighbors=2)
clfknn.fit(X_train, y_train)
评估
简单快速而有效的评估方法
一种简单快速而有效的评估方法是在每个训练模型中使用得分方法。得分方法使用测试数据集的y找到self.predict(X)的平均准确度。
confidencereg = clfreg.score(X_test, y_test)
confidencepoly2 = clfpoly2.score(X_test,y_test)
confidencepoly3 = clfpoly3.score(X_test,y_test)
confidenceknn = clfknn.score(X_test, y_test)
results
('The linear regression confidence is ', 0.96399641826551985)
('The quadratic regression 2 confidence is ', 0.96492624557970319)
('The quadratic regression 3 confidence is ', 0.9652082834532858)
('The knn regression confidence is ', 0.92844658034790639)
这显示了这些模型的大多数有着极高准确度分数(> 0.95)。然而,这并不意味着我们可以盲目套用,并交易股票。仍有许多问题需要注意,特别是对于不同价格轨迹不同的公司。
为了直观感觉,打印一些预测的股票预测结果。
forecast_set = clf.predict(X_lately)
dfreg['Forecast'] = np.nan
result
(array([ 115.44941187, 115.20206522, 116.78688393, 116.70244946,
116.58503739, 115.98769407, 116.54315699, 117.40012338,
117.21473053, 116.57244657, 116.048717 , 116.26444966,
115.78374093, 116.50647805, 117.92064806, 118.75581186,
118.82688731, 119.51873699]), 0.96234891774075604, 18)
预测可视化
使用现有的历史数据来预测未来的价格。这将有助于我们理解并掌握,可视化模型如何预测未来的股票定价。
last_date = dfreg.iloc[-1].name
last_unix = last_date
next_unix = last_unix + datetime.timedelta(days=1)
for i in forecast_set:
next_date = next_unix
next_unix += datetime.timedelta(days=1)
dfreg.loc[next_date] = [np.nan for _ in range(len(dfreg.columns)-1)]+[i]
dfreg['Adj Close'].tail(500).plot()
dfreg['Forecast'].tail(500).plot()
plt.legend (loc = 4)
plt.xlabel('Date')
plt.ylabel('Price')
plt.show()
Stock forecast visualization
You can see the blue shows the stock price based on regression. The forecast future price will be less prolonged slump, then recover. Therefore, you can buy and sell stocks during the downturn during the economic upturn.
Reproduced in: https: //www.jianshu.com/p/0fd209702365