The road to machine learning: python k-nearest neighbor classifier iris classification prediction

 

API for learning k-nearest neighbor classifier using python language

Welcome to my git to view the source code:  https://github.com/linyi0604/kaggle

 

  1  from sklearn.datasets import load_iris
   2  from sklearn.cross_validation import train_test_split
   3  from sklearn.preprocessing import StandardScaler
   4  from sklearn.neighbors import KNeighborsClassifier
   5  from sklearn.metrics import classification_report
   6  
  7  ''' 
  8  k-nearest neighbor classifier
   9  through the distribution of the data Making decisions on forecast data
 10  is a type of parameter-free estimation
 11  Very high computational complexity and memory consumption
 12  ''' 
13 
14  ''' 
15  1 Prepare data
 16  ''' 
17  #Read the iris dataset 
18 iris = load_iris()
 19 #Check the  data size 20 # print( iris.data.shape ) # (150, 4) 21 #View Data Notes 22 # print(iris.DESCR) 23 ''' 24 Iris Plants Database
 25 ====================
 26 27 Notes
 28 -----
 29 Data Set Characteristics:
 30     :Number of Instances: 150 (50 in each of three classes)
 31
 
 
 
 
   
         :Number of Attributes: 4 numeric, predictive attributes and the class
 32     :Attribute Information:
 33         - sepal length in cm
 34         - sepal width in cm
 35         - petal length in cm
 36         - petal width in cm
 37         - class:
 38                 - Iris-Setosa
 39                 - Iris-Versicolour
 40                 - Iris-Virginica
 41     :Summary Statistics:
 42 
 43     ============== ==== ==== ======= ===== ====================
 44                     Min  Max   Mean    SD   Class Correlation
 45     ============== ==== ==== ======= ===== ====================
 46     sepal length:   4.3  7.9   5.84   0.83    0.7826
 47     sepal width:    2.0  4.4   3.05   0.43   -0.4194
 48     petal length:   1.0  6.9   3.76   1.76    0.9490  (high!)
 49     petal width:    0.1  2.5   1.20  0.76     0.9565  (high!)
 50     ============== ==== ==== ======= ===== ====================
 51 
 52     :Missing Attribute Values: None
 53     :Class Distribution: 33.3% for each of 3 classes.
 54     :Creator: R.A. Fisher
 55     :Donor: Michael Marshall (MARSHALL%[email protected])
 56     :Date: July, 1988
 57 
 58 This is a copy of UCI ML iris datasets.
 59 http://archive.ics.uci.edu/ml/datasets/Iris
 60 
 61 The famous Iris database, first used by Sir R.A Fisher
 62 
 63 This is perhaps the best known database to be found in the
 64 pattern recognition literature.  Fisher's paper is a classic in the field and
 65 is referenced frequently to this day.  (See Duda & Hart, for example.)  The
 66 data set contains 3 classes of 50 instances each, where each class refers to a
 67 type of iris plant.  One class is linearly separable from the other 2; the
 68 latter are NOT linearly separable from each other.
 69 
 70 References
 71 ----------
 72    - Fisher,R.A. "The use of multiple measurements in taxonomic problems"
 73      Annual Eugenics, 7, Part II, 179-188 (1936); also in "Contributions to
 74      Mathematical Statistics" (John Wiley, NY, 1950).
 75    - Duda,R.O., & Hart,P.E. (1973) Pattern Classification and Scene Analysis.
 76      (Q327.D83) John Wiley & Sons.  ISBN 0-471-22361-1.  See page 218.
 77    - Dasarathy, B.V. (1980) "Nosing Around the Neighborhood: A New System
 78      Structure and Classification Rule for Recognition in Partially Exposed
 79       Environments". IEEE Transactions on Pattern Analysis and Machine
 80       Intelligence, Vol. PAMI-2, No. 1, 67-71.
 81     - Gates, GW (1972) "The Reduced Nearest Neighbor Rule". IEEE Transactions
 82       on Information Theory, May 1972, 431-433.
 83     - See also: 1988 MLC Proceedings, 54-64. Cheeseman et al"s AUTOCLASS II
 84       conceptual clustering system finds 3 classes in the data.
 85     - Many, many more ...
 86     
87     A total of 150 data samples
 88     Evenly distributed over 3 subspecies
 89     Each sample sampled 4 petals, calyx shape description
90  ''' 
91  
92  ''' 
93  2 Split training set and test set
 94  ''' 
95 x_train, x_test, y_train, y_test = train_test_split(iris.data,
 96                                                      iris.target,
 97                                                      test_size=0.25 ,
 98                                                      random_state=33 )
 99  
100  ''' 
101  3 k nearest neighbor classifier learning model and prediction
 102  ''' 
103  #training data and test data are standardized 
104 ss = StandardScaler()
 105x_train = ss.fit_transform(x_train)
 106 x_test = ss.transform(x_test)
 107  
108  #Create a k-nearest neighbor model object 
109 knc = KNeighborsClassifier()
 110  #Enter training data for learning modeling 
111  knc.fit(x_train, y_train)
 112  #Predict the test data 
113 y_predict = knc.predict(x_test)
 114  
115  ''' 
116  4 Model evaluation
 117  ''' 
118  print ( " Accuracy: " , knc.score(x_test, y_test))
 119  print ( " Other metrics:\n" , classification_report(y_test, y_predict, target_names=   iris.target_names ))
 120  121 ''' 
121  Accuracy: 0.8947368421052632
 122  Other indicators:
 123                precision recall f1-score support
 124  
125       setosa 1.00 1.00 1.00 8
 126 1.00 126  versicolor 0.73
 1.00 0.79 0.88 19
 128 129 avg/total 0.92 0.89 0.90 38
 130 '''  
  

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325032056&siteId=291194637