[Python Machine Learning] Iris classification machine learning applications

1. Description of the problem

　　Suppose a botanical varieties of iris lovers she found very interesting. She collected some measurements each flower Iris:

The length and width as well as length and width of petals calyx, all units are cm measurements.

　　She also had a number of measurement data iris, the flower before they have been identified as belonging to the botanical expert setosa, versicolor virginica one or three varieties. For these measurements, she may be determined iris each flower species belongs.

　　We assume that the botany enthusiasts will encounter three iris in the wild. Our goal is to build a machine learning model, we can learn from these known varieties of iris measurement data in order to be able to predict new iris varieties . Because we have known varieties of iris measurement data, so this is a supervised learning problem. In this problem, we want to predict one (Iris species) in a number of options. This is a classification (classifification) example of the problem. Possible output (different varieties of iris) called category (class). Each data set iris flowers fall into one of three categories, so this is a three-classification problem.

2, the test code

 1 #!/usr/bin/env python
 2 # -*- coding: utf-8 -*-
 3 # @File  : Iris.py
 4 # @Author: 赵路仓
 5 # @Date  : 2020/2/26
 6 # @Desc  :
 7 # @Contact : [email protected] 
 8 
 9 import numpy as np
10 import matplotlib.pyplot as plt
11 import pandas as pd
12 import mglearn
13 import pandas as pd
14 from sklearn.datasets import load_iris  # Iris (Iris) data set, which is machine learning and statistical data set in a classic 
15  from sklearn.model_selection Import train_test_split
 16  
17 iris_dataset = load_iris ()   # load_iris returned iris Bunch object is an object with the dictionary is very Similarly, and which contains the key value 
18 is  Print ( " key or iris_dataset: \ n-{} " .format (iris_dataset.keys ()))   # printing 
. 19  Print (iris_dataset [ ' DESCR ' ] [: 193] + " \ n-. .. " )   # DESCR key value corresponding to a brief description of the data set. target_names corresponding to the key value is a string array which contains we want to predict the varieties of flowers 
20  Print ( "Names the Target: {} " .format (iris_dataset [ ' target_names ' ]))   # three kinds of type name of the flower 
21 is  Print ( " the Feature names: {} " .format (iris_dataset [ ' feature_names ' ]))   # three kinds of flower characteristic, the length and width of the petals calyx length width 
22 is  Print ( " the type of data: {} " .format (type (iris_dataset [ ' data ' ])))   # of each row of the data array corresponding to a flower, columns representing each flower the four measurement data 
23 is  Print ( " the Shape of data: {} " .format (iris_dataset [' Data ' ] .shape))   # array contains 150 different flowers of the measurement data 
24  Print ( " First Five rows of Data: \ n-{} " .format (iris_dataset [ ' Data ' ] [:. 5]))   # front flower five data 
25  Print ( " the type of target: {} " .format (type (iris_dataset [ ' target ' ])))   # is a one-dimensional array, a data corresponding to each flower which 
26 is  Print ( " the Shape of target: {} " .format (iris_dataset [ ' target ' ].  shape))  #
27  Print ( " the Target: \ n-{} " .format (iris_dataset [ ' target ' ]))   # species 012 into three integer representing the three categories 
28  
29 X_train, X_test, Y_train, android.permission.FACTOR. = Train_test_split (iris_dataset [ ' Data ' ], iris_dataset [ ' target ' ], random_state = 0)
 30  Print ( " X_train Shape: {} " .format (X_train.shape))
 31 is  Print ( " Y_train Shape: {} " .format (Y_train. Shape))
 32  Print (" X_test Shape: {} " .format (X_test.shape))
 33 is  Print ( " android.permission.FACTOR. Shape: {} " .format (Y_test.shape))
 34 is  
35  # data created using X_train DataFrame 
36  # character using the iris_dataset.feature_names mark data 
37 [ iris_dataframe = pd.DataFrame (X_train, Columns = iris_dataset.feature_names)   # abscissa and the abscissa Title 
38  # use DataFrame create a scatter plot matrix, colored by y_trian 
39 GRR = pd.plotting.scatter_matrix (iris_dataframe, Y_train = C, figsize = (15, 15), marker = ' O ' , hist_kwds = { ' bins ': 20}, s=60, alpha=.8, cmap=mglearn.cm3)
40 plt.show()

　　NOTE: wherein each row of the data array representative of a flower, four columns represent the measured data of each flower, a total of 150 different flowers. The target is a one-dimensional array, each flower representative data therein, represents an integer of 0,1,2 three three different flower species.

3, measure success: training data and test data

　　First, the model can not be used to evaluate the data to build the model, because the model is adapted to build a data model, used to test if the match must be 100%. Therefore, to use the new data to test the model.　

　　Part of the data used to construct a machine learning model, called training training data (training data) or the training set (Training SET) , the data used to construct the machine learning models. The remaining data is used to evaluate the performance of the model, called the test data (Test Data) , the test set (testset -) or leave set (SET HOLD-OUT) . train_test_split scikit-learn function may disrupt the data set and split. 75% of the label and a corresponding data line as the training set, and 25% of the tab as the test data set, 75% and 25% can be changed according to the situation.

　　In short, the training is to build the data model, and the test data is to test whether the model of success. Represented by X input, respectively, four flower data, represented by Y output.

　　train_test_split function using a pseudo-random number generator disrupt the data set, using random_state parameter specifies the random number generator seed. So that the function output is fixed, the output of this line is always the same.

　　The part of the code as follows:

X_train, X_test, Y_train, Y_test = train_test_split(iris_dataset['data'], iris_dataset['target'], random_state=0)
print("X_train shape:{}".format(X_train.shape))
print("Y_train shape:{}".format(Y_train.shape))
print("X_test shape:{}".format(X_test.shape))
print("Y_test shape:{}".format(Y_test.shape))

4, observations

　　A visual rendering method is a scattergram (scatter plot). The scattergram data as a characteristic x-axis, y-axis to another feature, each of the data points plotted as a point on the FIG. Unfortunately, the computer screen only two dimensions, so we can only draw two features (possibly three). More than 3 plotted against the hard characteristic data set in this way. One way to solve this problem is to draw a scatter plot matrix (pair plot).

　　The part of the code as follows:

1  # using data created DataFrame X_train 
2  # use iris_dataset.feature_names character data mark 
. 3 iris_dataframe = pd.DataFrame (X_train, Columns = iris_dataset.feature_names)   # abscissa and the abscissa name 
4  # utilizing scattergram created DataFrame matrix, according y_trian coloring 
. 5 GRR = pd.plotting.scatter_matrix (iris_dataframe, C = Y_train, figsize = (15, 15), marker = ' O ' , hist_kwds = { ' bins ' :} 20 is, S = 60, = Alpha .8, CMap = mglearn.cm3)
 . 6 plt.show ()

　　Data show results: