API for learning k-nearest neighbor classifier using python language
Welcome to my git to view the source code: https://github.com/linyi0604/kaggle
1 from sklearn.datasets import load_iris 2 from sklearn.cross_validation import train_test_split 3 from sklearn.preprocessing import StandardScaler 4 from sklearn.neighbors import KNeighborsClassifier 5 from sklearn.metrics import classification_report 6 7 ''' 8 k-nearest neighbor classifier 9 through the distribution of the data Making decisions on forecast data 10 is a type of parameter-free estimation 11 Very high computational complexity and memory consumption 12 ''' 13 14 ''' 15 1 Prepare data 16 ''' 17 #Read the iris dataset 18 iris = load_iris() 19 #Check the data size 20 # print( iris.data.shape ) # (150, 4) 21 #View Data Notes 22 # print(iris.DESCR) 23 ''' 24 Iris Plants Database 25 ==================== 26 27 Notes 28 ----- 29 Data Set Characteristics: 30 :Number of Instances: 150 (50 in each of three classes) 31 :Number of Attributes: 4 numeric, predictive attributes and the class 32 :Attribute Information: 33 - sepal length in cm 34 - sepal width in cm 35 - petal length in cm 36 - petal width in cm 37 - class: 38 - Iris-Setosa 39 - Iris-Versicolour 40 - Iris-Virginica 41 :Summary Statistics: 42 43 ============== ==== ==== ======= ===== ==================== 44 Min Max Mean SD Class Correlation 45 ============== ==== ==== ======= ===== ==================== 46 sepal length: 4.3 7.9 5.84 0.83 0.7826 47 sepal width: 2.0 4.4 3.05 0.43 -0.4194 48 petal length: 1.0 6.9 3.76 1.76 0.9490 (high!) 49 petal width: 0.1 2.5 1.20 0.76 0.9565 (high!) 50 ============== ==== ==== ======= ===== ==================== 51 52 :Missing Attribute Values: None 53 :Class Distribution: 33.3% for each of 3 classes. 54 :Creator: R.A. Fisher 55 :Donor: Michael Marshall (MARSHALL%[email protected]) 56 :Date: July, 1988 57 58 This is a copy of UCI ML iris datasets. 59 http://archive.ics.uci.edu/ml/datasets/Iris 60 61 The famous Iris database, first used by Sir R.A Fisher 62 63 This is perhaps the best known database to be found in the 64 pattern recognition literature. Fisher's paper is a classic in the field and 65 is referenced frequently to this day. (See Duda & Hart, for example.) The 66 data set contains 3 classes of 50 instances each, where each class refers to a 67 type of iris plant. One class is linearly separable from the other 2; the 68 latter are NOT linearly separable from each other. 69 70 References 71 ---------- 72 - Fisher,R.A. "The use of multiple measurements in taxonomic problems" 73 Annual Eugenics, 7, Part II, 179-188 (1936); also in "Contributions to 74 Mathematical Statistics" (John Wiley, NY, 1950). 75 - Duda,R.O., & Hart,P.E. (1973) Pattern Classification and Scene Analysis. 76 (Q327.D83) John Wiley & Sons. ISBN 0-471-22361-1. See page 218. 77 - Dasarathy, B.V. (1980) "Nosing Around the Neighborhood: A New System 78 Structure and Classification Rule for Recognition in Partially Exposed 79 Environments". IEEE Transactions on Pattern Analysis and Machine 80 Intelligence, Vol. PAMI-2, No. 1, 67-71. 81 - Gates, GW (1972) "The Reduced Nearest Neighbor Rule". IEEE Transactions 82 on Information Theory, May 1972, 431-433. 83 - See also: 1988 MLC Proceedings, 54-64. Cheeseman et al"s AUTOCLASS II 84 conceptual clustering system finds 3 classes in the data. 85 - Many, many more ... 86 87 A total of 150 data samples 88 Evenly distributed over 3 subspecies 89 Each sample sampled 4 petals, calyx shape description 90 ''' 91 92 ''' 93 2 Split training set and test set 94 ''' 95 x_train, x_test, y_train, y_test = train_test_split(iris.data, 96 iris.target, 97 test_size=0.25 , 98 random_state=33 ) 99 100 ''' 101 3 k nearest neighbor classifier learning model and prediction 102 ''' 103 #training data and test data are standardized 104 ss = StandardScaler() 105x_train = ss.fit_transform(x_train) 106 x_test = ss.transform(x_test) 107 108 #Create a k-nearest neighbor model object 109 knc = KNeighborsClassifier() 110 #Enter training data for learning modeling 111 knc.fit(x_train, y_train) 112 #Predict the test data 113 y_predict = knc.predict(x_test) 114 115 ''' 116 4 Model evaluation 117 ''' 118 print ( " Accuracy: " , knc.score(x_test, y_test)) 119 print ( " Other metrics:\n" , classification_report(y_test, y_predict, target_names= iris.target_names )) 120 121 ''' 121 Accuracy: 0.8947368421052632 122 Other indicators: 123 precision recall f1-score support 124 125 setosa 1.00 1.00 1.00 8 126 1.00 126 versicolor 0.73 1.00 0.79 0.88 19 128 129 avg/total 0.92 0.89 0.90 38 130 '''