sklearn.model_selection.train_test_split 用法

sklearn.model_selection.train_test_split 用法
在使用python做机械学习时候，为了制作训练数据（training samples）和测试数据（testing samples），常使用sklearn里面的
sklearn.model_selection.train_test_split模块。
train_test_split的使用方法：
sklearn.model_selection.train_test_split(*arrays, **options)
train_test_split里面常用的因数（arguments）介绍：
arrays：分割对象同样长度的列表或者numpy arrays，矩阵。
test_size：两种指定方法。1：指定小数。小数范围在0.0~0.1之间，它代表test集占据的比例。2：指定整数。整数的大小必须在这个数据集个数范围内，总不能指定一个数超出了数据集的个数范围吧。要是test_size在没有指定的场合，可以通过train_size来指定。（两个是对应关系）。如果train_size也没有指定，那么默认值是0.25.
train_size：和test_size相似。
random_state:这是将分割的training和testing集合打乱的个数设定。如果不指定的话，也可以通过numpy.random来设定随机数。
shuffle和straify不常用。straify就是将数据分层。
train_test_split 用法举例：
这个数据集 4列（カラム），12行（レコード）。
>>> import pandas as pd
>>> from sklearn.model_selection import train_test_split
>>>
>>> namelist = pd.DataFrame({
... "name" : ["Suzuki", "Tanaka", "Yamada", "Watanabe", "Yamamoto",
... "Okada", "Ueda", "Inoue", "Hayashi", "Sato",
... "Hirayama", "Shimada"],
... "age": [30, 40, 55, 29, 41, 28, 42, 24, 33, 39, 49, 53],
... "department": ["HR", "Legal", "IT", "HR", "HR", "IT",
... "Legal", "Legal", "IT", "HR", "Legal", "Legal"],
... "attendance": [1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1]
... })
>>> print(namelist)

age attendance department name
0 30 1 HR Suzuki
1 40 1 Legal Tanaka
2 55 1 IT Yamada
3 29 0 HR Watanabe
4 41 1 HR Yamamoto
5 28 1 IT Okada
6 42 1 Legal Ueda
7 24 0 Legal Inoue
8 33 0 IT Hayashi
9 39 1 HR Sato
10 49 1 Legal Hirayama
11 53 1 Legal Shimada
将testing数据指定为0.3（test_size=0.3），从而将testing和training 集合分开。
————————————————
版权声明：本文为CSDN博主「大鱼霸吃小鱼儿」的原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接及本声明。
原文链接：https://blog.csdn.net/datascientist_chen/article/details/79024020

sklearn.model_selection.train_test_split 用法

猜你喜欢