sklearn决策树回归

sklearn 机器学习库的强大，决策树通过使用 DecisionTreeRegressor 类也可以用来解决回归问题。

# -*- coding: utf-8 -*-
import numpy as np
from sklearn.tree import DecisionTreeRegressor
import matplotlib.pyplot as plt

# Create a random dataset
rng = np.random.RandomState(2)
print(rng)
print('****************')
X = np.sort( 5*rng.rand(80, 1), axis=0)#axis 列维度进行排序，rank 生成0-1之间的书  80行1列
print(X)
y = np.sin(X).ravel()#扁平化操作
y[::5] += 3 * (0.5 - rng.rand(16))# 每五个点增加一次噪音
# Fit regression model
regr_1 = DecisionTreeRegressor(max_depth=2)
regr_2 = DecisionTreeRegressor(max_depth=5)
regr_3 = DecisionTreeRegressor(max_depth=8)
regr_1.fit(X, y)
regr_2.fit(X, y)
regr_3.fit(X, y)
# Predict
X_test = np.arange(0.0, 5.0, 0.01)[:, np.newaxis]
y_1 = regr_1.predict(X_test)
y_2 = regr_2.predict(X_test)
y_3 = regr_3.predict(X_test)
# Plot the results
plt.figure()
plt.scatter(X, y, c="darkorange", label="data")
plt.plot(X_test, y_1, color="cornflowerblue", label="max_depth=2", linewidth=2)
#树的最大深度为2，不同深度，拟合的效果不同，一般深度越深，拟合的越好，但是注意过拟合
plt.plot(X_test, y_2, color="yellowgreen", label="max_depth=5", linewidth=2)
plt.plot(X_test, y_3, color="lightblue", label="max_depth=8", linewidth=2)

plt.xlabel("data")
plt.ylabel("target")
plt.title("Decision Tree Regression")
plt.legend()
plt.show()

从上面的测试可以看出随着决策树最大深度的增加,决策树的拟合能力不断上升.
在这个例子中一共有80个样本,当最大深度为8(大于lg(200))时,我们的决策树已经不仅仅拟合了我们的正确样本,同时也拟合了我们添加的噪音,这导致了其泛化能力的下降.

恋上萤火

发布了46 篇原创文章 · 获赞 9 · 访问量 1万+

私信关注

猜你喜欢