Line by line, line by line, knock learn python & KNN

This study notes and reference to https://www.runoob.com/python/python-functions.html https://blog.csdn.net/scott198510/article/details/98198312

https://blog.csdn.net/HCYHanson/article/details/89332207

# EVERYTHING :  

        TODO naturally expressed the need to do a few things to do to be done without help after the retrieval, as well as make further changes to the overall project iterations, in PyCharm using Alt + 6 shortcut keys, you can quickly bring up the project all TODO comment, is understood as a personal label, there are representatives did not do something, the time will want to do to be engaged in back.

 

iris data:

      Iris data set is commonly used classification of experimental data sets from Fisher, 1936 collected. Hui also called Iris Iris data set, the data set is a class of multivariate analysis. Dataset contains 150 data samples, is divided into three categories, each category data 50, each of the data contains four attributes. In short, we are dealing with data objects. The target property represents the type of iris flower, which is what we ultimately value for the taget his classification, 0,1,2 represent three different flower, iris flowers of tager_name can access real name by the four properties of flowers. The iris data 150 rows, each row has four data fields properties.

 

train_test_split

  Format train_test_split (test_size, train_size, rondom_state = None, shuffle = True, stratify = None), test parameters were set size, the size of the training set, specify a random fashion (random seed), whether man-shuffle, the last one, if not none, the resolution data sets in a layered manner, and use this as a class label. In the code, X_train training set data, y_train training set labels, X_test test set data, y_test test set labels. This model is carried out with X_train and y_train supervised machine learning, and then the X_test and y_test check last forecast special value.

 

Define a distance function def euc_dis (instance1, instance2):

  python function how to write it, the first format is a def function name ( parameter list ) :, and c, different java, the type belonging to the object, the variable is untyped, so the argument list for a get parameter type which is a a objects can be understood as the name of a data transfer in, then we operate on this object, similar to the value passed c ++, such as integer, string, a tuple. The value Fun (a), only a transfer of a subject itself has no effect. Such modifications (a) an internal value of a fun, but modify the object to another copy, in itself does not affect a; c ++ similarly pass a reference, such as lists, dictionaries. As fun (la), sucked la real pass in the past, the revised fun outside la also be affected.

 

** operator and the sum function

  ** ** denotes exponentiation e.g. 2 3 = 8.

  sum sum function format (Iterable, start) returns a numeric sequence (non-string), and, plus the value of the parameter 'start', the default is not to start writing 0

 

Why did not the main function

  python is an interpreted scripting language, and C / C ++ language different, C / C ++ program, the program execution python sequence from the beginning to the end of the main function begins execution. Meaning that execution line by line, so, first define a function in python inside, and then use!

 

Array  

  Is defined name = [] array, append method may be used to add elements to the end, or as an element itself is added into another array

 

for loop

  Format: for iterator name in the array names

 

nlargest () and nsmallest () 

  In heapq library, return some of the largest or smallest of several numbers, format: heapq.nlargest (k, object name) // return large numbers of pre-k

All code plus comments:

 1 import heapq
 2 import numpy as np
 3 from sklearn import datasets
 4 from sklearn.model_selection import train_test_split
 5 
 6 #导入数据m
 7 iris=datasets.load_iris()
 8 x = iris.data
 9 y = iris.target
10 x_train,y_train,x_test,y_test=train_test_split(x,y,random_state=2019)
11 
12 def dis(instance1, instance2): #参数是向量,在直角坐标系就有两个数据,(x,y)
13     diff=instance1-instance2   #向量相减就是(x1-x2,y1-y2)
14     diff=diff ** 2
15     distnum=sum(diff) ** 0.5
16     return distnum
17 
18 def knn_classify(x,y,testinstance,k):#返回testinstance的k个最短邻居的类别
19     dis=[]
20     for i in x:
21         dis.append(dis(i,testinstance))    #此时dis【i】表示第i个向量与testinstance的距离
22         
23 
24     maxindex = map(dis.index, heapq,nsmallset(k,dis)) # map代表一个映射,选出k个最小距离的值和索引作为映射
25     maxy=[]
26     for i in maxindex:
27         maxy.append(y[i])#这里的y是iris.target的数据集,y【i】表示第i朵花的颜色
28         
29     return max(maxy, key=maxy.count) #这里返回的是出现次数最多的那个值,那个值就是标签
30 
31 redictions = [knn_classify(x_train,y_train,data,3) for data in x_test]
32 correct = np.count_nonzero((predictions==y_test)==True)
33 print("Accruacy is: %.3f" %(correct/len(x_test)))
View Code

 

 心得:

  在学的时候有一些些困难,思想虽然会了,但落实到python不知道怎么写,所以我就一行一行查百度,这是啥这又是啥,还好经过无数次百度之后终于大致略懂了函数的格式,但也只是知道格式而已,没有系统性地学一下python,故这周目标就是系统性地学python了,从hello word开始吧,手动滑稽。

 

 

 

 

 

 

 

 

 

 

 

 

  

 

 

 

Guess you like

Origin www.cnblogs.com/qq2210446939/p/11865408.html