Stratified k-fold&TimeSeriesSplit

教程连接:https://scikit-learn.org/stable/modules/cross_validation.html#cross-validation
当样本中各类的含量不平衡时用Stratified k-fold函数来选择训练集和学习集。例如:

X=np.ones(10)

y=[0,0,0,0,1,1,1,1,1,1]

len(y)

10

skf=StratifiedKFold(n_splits=3)

for train,test in skf.split(X,y):
print(’%s %s’ % (train,test))

[2 3 6 7 8 9] [0 1 4 5]
[0 1 3 4 5 8 9] [2 6 7]
[0 1 2 4 5 6 7] [3 8 9]

2.GroupKFold:用于分类又group的数据
3.TimeSeriesSplit:时间序列数据训练集与测试集的分类方法
例如:

from sklearn.model_selection import TimeSeriesSplit

X=np.array([[1,2],[3,4],[1,2],[3,4],[1,2],[3,4]])

y=np.array(range(6))

y

array([0, 1, 2, 3, 4, 5])

y=np.array(range(1,7))

y

array([1, 2, 3, 4, 5, 6])

tscv=TimeSeriesSplit(n_splits=3)

print(tscv)

TimeSeriesSplit(max_train_size=None, n_splits=3)

for train,test in tscv.split(X):
print(’%s %s’ % (train,test))

[0 1 2] [3]
[0 1 2 3] [4]
[0 1 2 3 4] [5]

猜你喜欢

转载自blog.csdn.net/weixin_43055882/article/details/87202463