Decision Tree Regression with AdaBoost 自适应增强决策树散点回归

示例网址:
https://scikit-learn.org/stable/auto_examples/ensemble/plot_adaboost_regression.html#sphx-glr-auto-examples-ensemble-plot-adaboost-regression-py

当y = list(np.sin(3*X).ravel() + rng.normal(0, 0.1, X.shape[0]))时:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import AdaBoostRegressor

#Create the dataset
rng = np.random.RandomState(1)
X = np.linspace(0, 6, 100)[:, np.newaxis]
y = list(np.sin(X).ravel() +np.sin(6*X).ravel() + rng.normal(0, 0.1, X.shape[0]))

#Fit regression model
DTR_1 = DecisionTreeRegressor(max_depth=4)
DTR_2 = AdaBoostRegressor(DecisionTreeRegressor(max_depth=4),n_estimators=300, random_state=rng)
DTR_1.fit(X, y)
DTR_2.fit(X, y)

#Predict
y_1 = DTR_1.predict(X)
y_2 = DTR_2.predict(X)

#Plot the results
plt.figure()
plt.scatter(X, y, c="k", label="training samples")
plt.plot(X, y_1, c="g", label="n_estimators=1", linewidth=2)
plt.plot(X, y_2, c="r", label="n_estimators=300", linewidth=2)
plt.xlabel("data")
plt.ylabel("target")
plt.title("Boosted Decision Tree Regression")
plt.legend()
plt.show()

参考网址:

np.newaxis插入新维度:
https://blog.csdn.net/mameng1/article/details/54599306

[np.newaxis,:]和[:,np.newaxis]和分别是在行或列上增加维度,原来是(6,)的数组,在行上增加维度变成(1,6)的二维数组,在列上增加维度变为(6,1)的二维数组

rand vs normal in Numpy.random:
https://www.geeksforgeeks.org/rand-vs-normal-numpy-random-python/

numpy.random.rand(d0, d1, …, dn) :
creates an array of specified shape and fills it with random values.

numpy.random.normal(loc = 0.0, scale = 1.0, size = None) :
creates an array of specified shape and fills it with random values which is actually a part of Normal(Gaussian) Distribution.

CSV数据的读写:
1.csv.reader读写csv数据:
https://python3-cookbook.readthedocs.io/zh_CN/latest/c06/p01_read_write_csv_data.html

import csv
with open('stocks.csv') as f:
    f_csv = csv.reader(f)
    headers = next(f_csv)
    for row in f_csv:
        # Process row
        ...

2.pd.read_csv读写csv数据:
https://blog.csdn.net/atnanyang/article/details/70832257

import pandas as pd
df=pd.read_csv('filename',header=None,sep=' ')

注意事项:

n_estimators: 也就是弱学习器的最大迭代次数,或者说最大的弱学习器的个数。一般来说n_estimators太小,容易欠拟合,n_estimators太大,计算量会太大,并且n_estimators到一定的数量后,再增大n_estimators获得的模型提升会很小,所以一般选择一个适中的数值。默认是100。

猜你喜欢

转载自blog.csdn.net/weixin_34275246/article/details/85005800