Do not miss to see hundreds of millions! 5 minutes to learn to use Python stocks price forecast

Previous article for US stocks data from the correlation analysis made a presentation and demonstration of the code, then we will in the future make a prediction with historical price data for US stocks.

Anticipation of Stock Price

6313830-5cefbdb87cce9bbf
Do not miss to see hundreds of millions! 5 minutes to learn to use Python stocks price forecast

Anticipation of Stock Price

Project features

The use of these three machine learning models to predict stock:

Simple linear analysis,

Quadratic discriminant analysis (QDA)

K Nearest Neighbor (KNN).

First, let's write a few Function: level percentages and percentage change.

dfreg = df.loc[:,[‘Adj Close’,’Volume’]]

dfreg[‘HL_PCT’] = (df[‘High’] — df[‘Low’]) / df[‘Close’] * 100.0

dfreg[‘PCT_change’] = (df[‘Close’] — df[‘Open’]) / df[‘Open’] * 100.0

6313830-228603b73eb884ed
Do not miss to see hundreds of millions! 5 minutes to learn to use Python stocks price forecast

The final data block generated

Pretreatment and cross-validation

6313830-fb2d534a6f585595
Do not miss to see hundreds of millions! 5 minutes to learn to use Python stocks price forecast

Cross-validation

Before the data into the predictive model, the following steps will be used to clean and process the data:

1. Handling missing values ​​Loss

2. In the label data segmentation, predicted target column is AdjClose

3. Scale X, so that each data meet the linear regression distribution

4. Finally determine the training set X and X for model generation and evaluation

5. The result is set to be predicted and the tag identified as y

6. Training isolated by cross-validation test and the test model train separation

Well, step finished. code show as below:

Drop missing value

dfreg.fillna(value=-99999, inplace=True)

We want to separate 1 percent of the data to forecast

forecast_out = int(math.ceil(0.01 * len(dfreg)))

Separating the label here, we want to predict the AdjClose

forecast_col = 'Adj Close'

dfreg['label'] = dfreg[forecast_col].shift(-forecast_out)

X = np.array(dfreg.drop(['label'], 1))

Scale the X so that everyone can have the same distribution for linear regression

X = preprocessing.scale(X)

Finally We want to find Data Series of late X and early X (train) for model generation and evaluation

X_lately = X[-forecast_out:]

X = X[:-forecast_out]

Separate label and identify it as y

y = np.array(dfreg['label'])

y = y[:-forecast_out]

模型生成 - 愉快的预测开始了

首先,导入Scikit-Learn:

from sklearn.linear_model import LinearRegression

from sklearn.neighbors import KNeighborsRegressor

from sklearn.linear_model import Ridge

from sklearn.preprocessing import PolynomialFeatures

from sklearn.pipeline import make_pipeline

简单线性分析和二次判别分析

简单线性分析显示两个或多个变量之间的线性关系。当我们在两个变量中绘制这种关系时,我们得到一条直线。二次判别分析与简单线性分析类似,只是模型允许多项式(例如:x平方)并产生曲线。

线性回归预测因变量(y)作为输出给出独立变量(x)作为输入。在绘图期间,这将给我们一条直线,如下所示:

6313830-4235d6ccbe74e6ef
不看错过好几亿!5分钟学会用Python预测美股价格

简单线性回归

下面通过使用Scikit-Learn库来训练模型。代码如下。

Linear regression

clfreg = LinearRegression(n_jobs=-1)

clfreg.fit(X_train, y_train)

Quadratic Regression 2

clfpoly2 = make_pipeline(PolynomialFeatures(2), Ridge())

clfpoly2.fit(X_train, y_train)

Quadratic Regression 3

clfpoly3 = make_pipeline(PolynomialFeatures(3), Ridge())

clfpoly3.fit(X_train, y_train)

K近邻(KNN)

该KNN使用特征相似性来预测数据点的值。这可确保分配的新点与数据集中的点类似。为了找出相似性,将选取点以得到最小距离(例如:欧几里德距离)。

6313830-303722c87a9fc3ce
不看错过好几亿!5分钟学会用Python预测美股价格

KNN模型可视化,您可以将被质疑的元素分组为k个元素

KNN Regression

clfknn = KNeighborsRegressor(n_neighbors=2)

clfknn.fit(X_train, y_train)

评估

6313830-934b433191f95507
不看错过好几亿!5分钟学会用Python预测美股价格

简单快速而有效的评估方法

一种简单快速而有效的评估方法是在每个训练模型中使用得分方法。得分方法使用测试数据集的y找到self.predict(X)的平均准确度。

confidencereg = clfreg.score(X_test, y_test)

confidencepoly2 = clfpoly2.score(X_test,y_test)

confidencepoly3 = clfpoly3.score(X_test,y_test)

confidenceknn = clfknn.score(X_test, y_test)

results

('The linear regression confidence is ', 0.96399641826551985)

('The quadratic regression 2 confidence is ', 0.96492624557970319)

('The quadratic regression 3 confidence is ', 0.9652082834532858)

('The knn regression confidence is ', 0.92844658034790639)

这显示了这些模型的大多数有着极高准确度分数(> 0.95)。然而,这并不意味着我们可以盲目套用,并交易股票。仍有许多问题需要注意,特别是对于不同价格轨迹不同的公司。

为了直观感觉,打印一些预测的股票预测结果。

forecast_set = clf.predict(X_lately)

dfreg['Forecast'] = np.nan

result

(array([ 115.44941187, 115.20206522, 116.78688393, 116.70244946,

116.58503739, 115.98769407, 116.54315699, 117.40012338,

117.21473053, 116.57244657, 116.048717 , 116.26444966,

115.78374093, 116.50647805, 117.92064806, 118.75581186,

118.82688731, 119.51873699]), 0.96234891774075604, 18)

预测可视化

使用现有的历史数据来预测未来的价格。这将有助于我们理解并掌握,可视化模型如何预测未来的股票定价。

last_date = dfreg.iloc[-1].name

last_unix = last_date

next_unix = last_unix + datetime.timedelta(days=1)

for i in forecast_set:

next_date = next_unix

next_unix += datetime.timedelta(days=1)

dfreg.loc[next_date] = [np.nan for _ in range(len(dfreg.columns)-1)]+[i]

dfreg['Adj Close'].tail(500).plot()

dfreg['Forecast'].tail(500).plot()

plt.legend (loc = 4)

plt.xlabel('Date')

plt.ylabel('Price')

plt.show()

6313830-e4469f7db59637c8
Do not miss to see hundreds of millions! 5 minutes to learn to use Python stocks price forecast

Stock forecast visualization

You can see the blue shows the stock price based on regression. The forecast future price will be less prolonged slump, then recover. Therefore, you can buy and sell stocks during the downturn during the economic upturn.

Reproduced in: https: //www.jianshu.com/p/0fd209702365

Guess you like

Origin blog.csdn.net/weixin_34055910/article/details/91264121