cross_val_score交叉验证及其用于参数选择、模型选择、特征选择

K折交叉验证：sklearn.model_selection.KFold(n_splits=3, shuffle=False, random_state=None)

思路：将训练/测试数据集划分n_splits个互斥子集，每次用其中一个子集当作验证集，剩下的n_splits-1个作为训练集，进行n_splits次训练和测试，得到n_splits个结果

注意点：对于不能均等份的数据集，其前n_samples % n_splits子集拥有n_samples // n_splits + 1个样本，其余子集都只有n_samples // n_splits样本

参数说明：

n_splits：表示划分几等份

shuffle：在每次划分时，是否进行洗牌

①若为Falses时，其效果等同于random_state等于整数，每次划分的结果相同

②若为True时，每次划分的结果都不一样，表示经过洗牌，随机取样的

random_state：随机种子数

属性：

①get_n_splits(X=None, y=None, groups=None)：获取参数n_splits的值

②split(X, y=None, groups=None)：将数据集划分成训练集和测试集，返回索引生成器

通过一个不能均等划分的栗子，设置不同参数值，观察其结果

①设置shuffle=False，运行两次，发现两次结果相同


  
  
   
   
    
    
     
     
    
    
    
    
     
     
      
      In [
      
      1]: 
      
      from sklearn.model_selection 
      
      import KFold
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
         ...: 
      
      import numpy 
      
      as np
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
         ...: X = np.arange(
      
      24).reshape(
      
      12,
      
      2)
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
         ...: y = np.random.choice([
      
      1,
      
      2],
      
      12,p=[
      
      0.4,
      
      0.6])
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
         ...: kf = KFold(n_splits=
      
      5,shuffle=
      
      False)
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
         ...: 
      
      for train_index , test_index 
      
      in kf.split(X):
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
         ...:     print(
      
      'train_index:%s , test_index: %s ' %(train_index,test_index))
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
         ...:
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
         ...:
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
      train_index:[ 
      
      3  
      
      4  
      
      5  
      
      6  
      
      7  
      
      8  
      
      9 
      
      10 
      
      11] , test_index: [
      
      0 
      
      1 
      
      2]
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
      train_index:[ 
      
      0  
      
      1  
      
      2  
      
      6  
      
      7  
      
      8  
      
      9 
      
      10 
      
      11] , test_index: [
      
      3 
      
      4 
      
      5]
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
      train_index:[ 
      
      0  
      
      1  
      
      2  
      
      3  
      
      4  
      
      5  
      
      8  
      
      9 
      
      10 
      
      11] , test_index: [
      
      6 
      
      7]
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
      train_index:[ 
      
      0  
      
      1  
      
      2  
      
      3  
      
      4  
      
      5  
      
      6  
      
      7 
      
      10 
      
      11] , test_index: [
      
      8 
      
      9]
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
      train_index:[
      
      0 
      
      1 
      
      2 
      
      3 
      
      4 
      
      5 
      
      6 
      
      7 
      
      8 
      
      9] , test_index: [
      
      10 
      
      11]
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
      
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
      In [
      
      2]: 
      
      from sklearn.model_selection 
      
      import KFold
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
         ...: 
      
      import numpy 
      
      as np
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
         ...: X = np.arange(
      
      24).reshape(
      
      12,
      
      2)
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
         ...: y = np.random.choice([
      
      1,
      
      2],
      
      12,p=[
      
      0.4,
      
      0.6])
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
         ...: kf = KFold(n_splits=
      
      5,shuffle=
      
      False)
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
         ...: 
      
      for train_index , test_index 
      
      in kf.split(X):
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
         ...:     print(
      
      'train_index:%s , test_index: %s ' %(train_index,test_index))
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
         ...:
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
         ...:
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
      train_index:[ 
      
      3  
      
      4  
      
      5  
      
      6  
      
      7  
      
      8  
      
      9 
      
      10 
      
      11] , test_index: [
      
      0 
      
      1 
      
      2]
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
      train_index:[ 
      
      0  
      
      1  
      
      2  
      
      6  
      
      7  
      
      8  
      
      9 
      
      10 
      
      11] , test_index: [
      
      3 
      
      4 
      
      5]
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
      train_index:[ 
      
      0  
      
      1  
      
      2  
      
      3  
      
      4  
      
      5  
      
      8  
      
      9 
      
      10 
      
      11] , test_index: [
      
      6 
      
      7]
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
      train_index:[ 
      
      0  
      
      1  
      
      2  
      
      3  
      
      4  
      
      5  
      
      6  
      
      7 
      
      10 
      
      11] , test_index: [
      
      8 
      
      9]
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
      train_index:[
      
      0 
      
      1 
      
      2 
      
      3 
      
      4 
      
      5 
      
      6 
      
      7 
      
      8 
      
      9] , test_index: [
      
      10 
      
      11]

②设置shuffle=True时，运行两次，发现两次运行的结果不同


  
  
   
   
    
    
     
     
    
    
    
    
     
     
      
      In [
      
      3]: 
      
      from sklearn.model_selection 
      
      import KFold
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
         ...: 
      
      import numpy 
      
      as np
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
         ...: X = np.arange(
      
      24).reshape(
      
      12,
      
      2)
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
         ...: y = np.random.choice([
      
      1,
      
      2],
      
      12,p=[
      
      0.4,
      
      0.6])
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
         ...: kf = KFold(n_splits=
      
      5,shuffle=
      
      True)
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
         ...: 
      
      for train_index , test_index 
      
      in kf.split(X):
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
         ...:     print(
      
      'train_index:%s , test_index: %s ' %(train_index,test_index))
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
         ...:
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
         ...:
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
      train_index:[ 
      
      0  
      
      1  
      
      2  
      
      4  
      
      5  
      
      6  
      
      7  
      
      8 
      
      10] , test_index: [ 
      
      3  
      
      9 
      
      11]
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
      train_index:[ 
      
      0  
      
      1  
      
      2  
      
      3  
      
      4  
      
      5  
      
      9 
      
      10 
      
      11] , test_index: [
      
      6 
      
      7 
      
      8]
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
      train_index:[ 
      
      2  
      
      3  
      
      4  
      
      5  
      
      6  
      
      7  
      
      8  
      
      9 
      
      10 
      
      11] , test_index: [
      
      0 
      
      1]
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
      train_index:[ 
      
      0  
      
      1  
      
      3  
      
      4  
      
      5  
      
      6  
      
      7  
      
      8  
      
      9 
      
      11] , test_index: [ 
      
      2 
      
      10]
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
      train_index:[ 
      
      0  
      
      1  
      
      2  
      
      3  
      
      6  
      
      7  
      
      8  
      
      9 
      
      10 
      
      11] , test_index: [
      
      4 
      
      5]
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
      
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
      In [
      
      4]: 
      
      from sklearn.model_selection 
      
      import KFold
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
         ...: 
      
      import numpy 
      
      as np
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
         ...: X = np.arange(
      
      24).reshape(
      
      12,
      
      2)
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
         ...: y = np.random.choice([
      
      1,
      
      2],
      
      12,p=[
      
      0.4,
      
      0.6])
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
         ...: kf = KFold(n_splits=
      
      5,shuffle=
      
      True)
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
         ...: 
      
      for train_index , test_index 
      
      in kf.split(X):
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
         ...:     print(
      
      'train_index:%s , test_index: %s ' %(train_index,test_index))
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
         ...:
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
         ...:
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
      train_index:[ 
      
      0  
      
      1  
      
      2  
      
      3  
      
      4  
      
      5  
      
      7  
      
      8 
      
      11] , test_index: [ 
      
      6  
      
      9 
      
      10]
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
      train_index:[ 
      
      2  
      
      3  
      
      4  
      
      5  
      
      6  
      
      8  
      
      9 
      
      10 
      
      11] , test_index: [
      
      0 
      
      1 
      
      7]
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
      train_index:[ 
      
      0  
      
      1  
      
      3  
      
      5  
      
      6  
      
      7  
      
      8  
      
      9 
      
      10 
      
      11] , test_index: [
      
      2 
      
      4]
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
      train_index:[ 
      
      0  
      
      1  
      
      2  
      
      3  
      
      4  
      
      6  
      
      7  
      
      9 
      
      10 
      
      11] , test_index: [
      
      5 
      
      8]
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
      train_index:[ 
      
      0  
      
      1  
      
      2  
      
      4  
      
      5  
      
      6  
      
      7  
      
      8  
      
      9 
      
      10] , test_index: [ 
      
      3 
      
      11]

③设置shuffle=True和random_state=整数，发现每次运行的结果都相同


  
  
   
   
    
    
     
     
    
    
    
    
     
     
      
      In [
      
      5]: 
      
      from sklearn.model_selection 
      
      import KFold
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
         ...: 
      
      import numpy 
      
      as np
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
         ...: X = np.arange(
      
      24).reshape(
      
      12,
      
      2)
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
         ...: y = np.random.choice([
      
      1,
      
      2],
      
      12,p=[
      
      0.4,
      
      0.6])
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
         ...: kf = KFold(n_splits=
      
      5,shuffle=
      
      True,random_state=
      
      0)
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
         ...: 
      
      for train_index , test_index 
      
      in kf.split(X):
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
         ...:     print(
      
      'train_index:%s , test_index: %s ' %(train_index,test_index))
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
         ...:
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
         ...:
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
      train_index:[ 
      
      0  
      
      1  
      
      2  
      
      3  
      
      5  
      
      7  
      
      8  
      
      9 
      
      10] , test_index: [ 
      
      4  
      
      6 
      
      11]
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
      train_index:[ 
      
      0  
      
      1  
      
      3  
      
      4  
      
      5  
      
      6  
      
      7  
      
      9 
      
      11] , test_index: [ 
      
      2  
      
      8 
      
      10]
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
      train_index:[ 
      
      0  
      
      2  
      
      3  
      
      4  
      
      5  
      
      6  
      
      8  
      
      9 
      
      10 
      
      11] , test_index: [
      
      1 
      
      7]
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
      train_index:[ 
      
      0  
      
      1  
      
      2  
      
      4  
      
      5  
      
      6  
      
      7  
      
      8 
      
      10 
      
      11] , test_index: [
      
      3 
      
      9]
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
      train_index:[ 
      
      1  
      
      2  
      
      3  
      
      4  
      
      6  
      
      7  
      
      8  
      
      9 
      
      10 
      
      11] , test_index: [
      
      0 
      
      5]
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
      
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
      In [
      
      6]: 
      
      from sklearn.model_selection 
      
      import KFold
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
         ...: 
      
      import numpy 
      
      as np
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
         ...: X = np.arange(
      
      24).reshape(
      
      12,
      
      2)
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
         ...: y = np.random.choice([
      
      1,
      
      2],
      
      12,p=[
      
      0.4,
      
      0.6])
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
         ...: kf = KFold(n_splits=
      
      5,shuffle=
      
      True,random_state=
      
      0)
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
         ...: 
      
      for train_index , test_index 
      
      in kf.split(X):
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
         ...:     print(
      
      'train_index:%s , test_index: %s ' %(train_index,test_index))
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
         ...:
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
         ...:
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
      train_index:[ 
      
      0  
      
      1  
      
      2  
      
      3  
      
      5  
      
      7  
      
      8  
      
      9 
      
      10] , test_index: [ 
      
      4  
      
      6 
      
      11]
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
      train_index:[ 
      
      0  
      
      1  
      
      3  
      
      4  
      
      5  
      
      6  
      
      7  
      
      9 
      
      11] , test_index: [ 
      
      2  
      
      8 
      
      10]
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
      train_index:[ 
      
      0  
      
      2  
      
      3  
      
      4  
      
      5  
      
      6  
      
      8  
      
      9 
      
      10 
      
      11] , test_index: [
      
      1 
      
      7]
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
      train_index:[ 
      
      0  
      
      1  
      
      2  
      
      4  
      
      5  
      
      6  
      
      7  
      
      8 
      
      10 
      
      11] , test_index: [
      
      3 
      
      9]
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
      train_index:[ 
      
      1  
      
      2  
      
      3  
      
      4  
      
      6  
      
      7  
      
      8  
      
      9 
      
      10 
      
      11] , test_index: [
      
      0 
      
      5]

④n_splits属性值获取方式


  
  
   
   
    
    
     
     
    
    
    
    
     
     
      
      In [
      
      8]: kf.split(X)
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
      Out[
      
      8]: <generator object _BaseKFold.split at 
      
      0x00000000047FF990>
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
      
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
      In [
      
      9]: kf.get_n_splits()
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
      Out[
      
      9]: 
      
      5
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
      
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
      In [
      
      10]: kf.n_splits
     
     
    
    
   
   
    
    
     
     
    
    
    
    
     
     
      
      Out[
      
      10]: 
      
      5

cross_val_score交叉验证及其用于参数选择、模型选择、特征选择

猜你喜欢