时间序列--样本量的大小对模型预测好坏的影响

# fit an ARIMA model
from pandas import Series
from matplotlib import pyplot
from statsmodels.tsa.arima_model import ARIMA
from sklearn.metrics import mean_squared_error
from math import sqrt
# load dataset
series = Series.from_csv('daily-minimum-temperatures.csv', header=0)
# seasonal difference
differenced = series.diff(365)
# trim off the first year of empty data
differenced = differenced[365:]
# split
train, test = differenced[differenced.index < '1990'], differenced['1990']
years = ['1989', '1988', '1987', '1986', '1985', '1984', '1983', '1982']
for year in years:
	# select data from 'year' cumulative to 1989
	dataset = train[train.index >= year]
	# walk forward over time steps in test
	values = dataset.values
	history = [values[i] for i in range(len(values))]
	predictions = list()
	test_values = test.values
	for t in range(len(test_values)):
		# fit model
		model = ARIMA(history, order=(7,0,0))
		model_fit = model.fit(trend='nc', disp=0)
		# make prediction
		yhat = model_fit.forecast()[0]
		predictions.append(yhat)
		history.append(test_values[t])
	rmse = sqrt(mean_squared_error(test_values, predictions))
	print('%s-%s (%d values) RMSE: %.3f' % (years[0], year, len(values), rmse))

1.differenced = differenced[365:]
代表我因为要差分上一年,所以第一年的数据365天我要去掉

2.测试集为90年

for year in years:体现了训练的多少

显示89,之后88-89,之后87-89.。。。每次都训练一个模型。

预测的时候采取的是滚动预测

3.结果:

1989-1989 (365 values) RMSE: 3.120
1989-1988 (730 values) RMSE: 3.109
1989-1987 (1095 values) RMSE: 3.104
1989-1986 (1460 values) RMSE: 3.108
1989-1985 (1825 values) RMSE: 3.107
1989-1984 (2190 values) RMSE: 3.103
1989-1983 (2555 values) RMSE: 3.099
1989-1982 (2920 values) RMSE: 3.096

https://machinelearningmastery.com/sensitivity-analysis-history-size-forecast-skill-arima-python/

猜你喜欢

转载自blog.csdn.net/kylin_learn/article/details/85369237
今日推荐