训练数据集和交叉验证数据集的分割方法

（一）K折交叉验证

from sklearn.model_selection import KFold ## K折交叉验证 
X = np.arange(36).reshape(18,2)           
kfold = KFold(n_splits = 9)               ## kfold为KFolf类的一个对象

for a, b in kfold.split(X):               ## .split(X)方法返回迭代器，迭代器每次产生两个元素，1、训练数据集的索引；
## 2、交叉验证数据集的索引。
    print('Train_index: ', a, 'Validation_index:', b)

返回结果：
Train_index:  [ 2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17] Validation_index: [0 1]
Train_index:  [ 0  1  4  5  6  7  8  9 10 11 12 13 14 15 16 17] Validation_index: [2 3]
Train_index:  [ 0  1  2  3  6  7  8  9 10 11 12 13 14 15 16 17] Validation_index: [4 5]
Train_index:  [ 0  1  2  3  4  5  8  9 10 11 12 13 14 15 16 17] Validation_index: [6 7]
Train_index:  [ 0  1  2  3  4  5  6  7 10 11 12 13 14 15 16 17] Validation_index: [8 9]
Train_index:  [ 0  1  2  3  4  5  6  7  8  9 12 13 14 15 16 17] Validation_index: [10 11]
Train_index:  [ 0  1  2  3  4  5  6  7  8  9 10 11 14 15 16 17] Validation_index: [12 13]
Train_index:  [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 16 17] Validation_index: [14 15]
Train_index:  [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15] Validation_index: [16 17]

（二）随机排序交叉验证器

from sklearn.model_selection import ShuffleSplit
X = np.array([[1, 2], [3, 4], [5, 6], [7, 8]])
y = np.array([1, 2, 1, 2])
rs = ShuffleSplit(n_splits=3, train_size = 0.5, test_size=.25, random_state=None)
## 产生“随机排序交叉验证器”, n_splits:交叉验证器中的分裂迭代器数
rs.get_n_splits(X)   ## 返回分割迭代次数
for train_index, test_index in rs.split(X): 
## .split(X)方法返回迭代器，迭代器每次产生两个元素，
##1、训练数据集的索引；2、交叉验证数据集的索引。
    print("TRAIN:", train_index, "TEST:", test_index)

运行结果：

3

TRAIN: [0 3] TEST: [1]
TRAIN: [3 2] TEST: [0]
TRAIN: [2 0] TEST: [3]

（三）针对若干组“训练-交叉验证数据集”，训练出若干个模型，并返回模型在交叉验证数据集上的若干得分

from sklearn.model_selection import cross_val_score 
cv_scores = cross_val_score(model, X, y, cv) 
## model为未经训练的模型， cv可以为上面提到的kfold或rs， 
## 而cv_scores就是cv对应的若干个训练数据集训练出来的若干个模型
## 在对应的交叉验证数据集上的得分

训练数据集和交叉验证数据集的分割方法

猜你喜欢