【模型评估与选择】sklearn.model_selection.train_test_split

1. 描述
Split arrays or matrices into random train and test subsets

2. 语法
train_test_split(*arrays, **options)

3. 参数
1.*arrays:(sequence of indexables with same length / shape[0])
Allowed inputs are lists, numpy arrays, scipy-sparse matrices or pandas dataframes.
2.test_size :(float, int, or None, default None)
If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the train split.
If int, represents the absolute number of train samples.
If None, the value is automatically set to the complement of the test size.
3.random_state :(int, RandomState instance or None, optional (default=None))
If int, random_state is the seed used by the random number generator;
If None, the random number generator is the RandomState instance used by np.random.
4.shuffle :(boolean, optional (default=True))
Whether or not to shuffle the data before splitting.
If shuffle=False then stratify must be None.
5.stratify 分层:(array-like or None (default is None))
If not None, data is split in a stratified fashion, using this as the class labels.

4. 返回
splitting : list, length=2 * len(arrays)
List containing train-test split of inputs.
New in version 0.16: If the input is sparse, the output will be a scipy.sparse.csr_matrix. Else, output type is the same as the input type.

5. 实例

import numpy as np
from sklearn.model_selection import train_test_split
X = np.arange(10).reshape((5,2))
y = range(5)
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.33,random_state=1)#返回的是含有四个元素的列表
print('X_train:\n',X_train,'\nX_test:\n',X_test,'\ny_train:\n',y_train,'\ny_test:\n',y_test)

X_train:
[[8 9]
[0 1]
[6 7]]
X_test:
[[4 5]
[2 3]]
y_train:
[4, 0, 3]
y_test:
[2, 1]

import numpy as np
from sklearn.model_selection import train_test_split
from sklearn import datasets
from sklearn import svm
iris = datasets.load_iris()
X = iris.data
y = iris.target
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size = .3,random_state = 0)
clf = svm.SVC(kernel='linear',C=1)
clf.fit(X_train, y_train)
#y_predict = clf.predict(X_test)
#accuracy_score(y_test,y_predict)
clf.score(X_test,y_test)

0.97777777777777775

猜你喜欢

转载自blog.csdn.net/datawhale/article/details/80886953