sklearn.model_selection.StratifiedShuffleSplit

分层抽样

https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.StratifiedShuffleSplit.html#sklearn.model_selection.StratifiedShuffleSplit.split

实例:根据Income_cat进行分层抽样,前提:Income_cat已经分成了5类

from sklearn.model_selection import StratifiedShuffleSplit
#根据收入类别进行分层抽样StratifiedShuffleSplit
#参数 n_splits是将训练数据分成train/test对的组数,可根据需要进行设置,默认为10,
#参数test_size和train_size是用来设置train/test对中train和test所占的比例,
#参数 random_state控制是将样本随机打乱
split=StratifiedShuffleSplit(n_splits=1,test_size=0.2,random_state=42)
for train_index,test_index in split.split(housing,housing["income_cat"]):
    strat_train_set=housing.loc[train_index]
    strat_test_set=housing.loc[test_index]

检查各类别的比例分布

猜你喜欢

转载自blog.csdn.net/qq_40949544/article/details/88035047