一些样本数据集

有一些数据集较小，而且是模拟时，其目的是强调算法的某个特定方面

1、一个模拟的二分类数据事例是forge数据集，它有两个特征，图像的第一个特征X[:,0]为x轴，第二个特征X[:,1]为y轴，y为样本标签，分为0和1，不同的标签用不同的形状表示，0是‘O’，1是‘△’

import mglearn
import matplotlib.pyplot as plt

X,y=mglearn.datasets.make_forge()
mglearn.discrete_scatter(X[:,0],X[:,1],y)
plt.legend(["Class 0","Class 1"],loc=4)
plt.xlabel("First feature")
plt.ylabel("Second feature")
print("X.shape:{}".format(X.shape))



X.shape:(26, 2)

plt.legend(["Class 0","Class 1"],loc=4)表示给图像加上图例，["Class 0","Class 1"]表示图标名称，loc="位置"，其中位置是可选的参数（例中loc=4，相当于loc="lower right"），如下

edgecolor：图例外框颜色

facecolor：图例框内填充颜色

title：图例标题

plt.legend(["Class 0","Class 1"],loc=4,title="forge",edgecolor="blue",facecolor="red")

plt.xlabel()表示横坐标名称

2、现在用模拟的wave数据集来说明回归算法。wave数据集只有一个输入特征和一个连续的目标变量，后者是模型想要预测的对象，下图中，单一特征位于x轴，回归目标（输出）位于y轴

import mglearn
import matplotlib.pyplot as plt

X,y=mglearn.datasets.make_wave(n_samples=40)
plt.plot(X,y,'o')
plt.ylim(-3,3)
plt.xlabel("Feature")
plt.ylabel("Target")