The first application: Iris Category

The first application: Iris Category

In this case we used the iris (Iris) data set, which is machine learning and statistics in a classic set of data.

Acquaintance data: what are the data it?

from sklearn.datasets import load_iris

data = load_iris()
print('key of load_iris:\n{}'.format(data.keys()))
结果:
key of load_iris:
dict_keys(['data', 'target', 'target_names', 'DESCR', 'feature_names', 'filename'])

data: data list, data which is the length of the calyx, calyx width, length measurement data petals, petal width

from sklearn.datasets import load_iris

data = load_iris()
# print('key of load_iris:\n{}'.format(data.keys()))
print('data of load_iris:\n{}'.format(data.data[:5]))


结果:
D:\software\Anaconda3\python.exe D:/MyCode/learn/11.py
data of load_iris:
[[5.1 3.5 1.4 0.2]
 [4.9 3.  1.4 0.2]
 [4.7 3.2 1.3 0.2]
 [4.6 3.1 1.5 0.2]
 [5.  3.6 1.4 0.2]]

target: the results (classification result, where a total of three categories, namely 0,1,2)

from sklearn.datasets import load_iris

data = load_iris()
# print('key of load_iris:\n{}'.format(data.keys()))
# print('data of load_iris:\n{}'.format(data.data[:5]))
print('target of load_iris:\n{}'.format(data.target))


结果:

D:\software\Anaconda3\python.exe D:/MyCode/learn/11.py
data of load_iris:
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
2 2]

target_name: Classification name (3 categories)

from sklearn.datasets import load_iris

data = load_iris()
# print('key of load_iris:\n{}'.format(data.keys()))
# print('data of load_iris:\n{}'.format(data.data[:5]))
# print('target of load_iris:\n{}'.format(data.target))
print('target of load_iris:\n{}'.format(data.target_names))


结果:
D:\software\Anaconda3\python.exe D:/MyCode/learn/11.py
target_name of load_iris:
['setosa' 'versicolor' 'virginica']
DESCR: Introduction Data
filename: path to the file
feature_names: Data Description

Their relationship as shown below:

 Training data and test data

In supervised learning, the data is divided into two, the training data and test data.

Training data used to program learning, and contains data and results in two parts.

Test data used to determine the accuracy of our program algorithm. To assess model performance, known as the test data (test data), test set (test set) or set aside (hold-out set).

train_test_split scikit-learn function may disrupt the data set and split. This function will be 75% of the line data and the corresponding tag as the training set, and the remaining 25% of the data and its tag as the test set. Distribution ratio of training set and test set can be arbitrary, but 25% of the data as the test set is a good rule of thumb.

 train_test_split usage instructions, see here https://blog.csdn.net/mrxjh/article/details/78481578

Train_test_split effect is the use of a pseudo-random set of data will disrupt, x_train comprising 75% of the line data, x_test comprises 25 data lines, as follows:

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split


data = load_iris()
x_train, x_test, y_train, y_test = train_test_split(data['data'],data['target'],random_state=0)

print('x_train length is:', len(x_train))
print('x_test length is:', len(x_test))
print('y_train length is:', len(y_train))
print('y_test length is:', len(y_test))



结果:
D:\software\Anaconda3\python.exe D:/MyCode/learn/11.py
x_train length is: 112
x_test length is: 38
y_train length is: 112
y_test length is: 38

analyze data

time out

 

Guess you like

Origin www.cnblogs.com/hardykay/p/11278026.html