First, the mathematical principles Perceptron
Perceptron algorithms optimized stochastic gradient descent algorithm, a binary linear model, the input feature vector instance, the classification result is output, the discriminant model belongs. Perceptron is intended to obtain the data can be divided into linear hyperplanes ( Note that only the data set divided into linearly separable, then k is the number of misclassification upper bound, i.e. perceptron algorithm converges, if the data set can not be separated , the algorithm does not converge ).
github code address, continuously updated
Perceptron algorithm basic steps
Input: training data set ( , ), Feature vector, For the category, a learning rate
output: Parameter w, b, perceptron model F (X) = Sign (WX + B)
. 1, select the initial value ,
2, the training data set in a randomly selected data point ( , )
3, if ( + ) 0 less than or equal
to update parameters w = +a ,
b= +a
4, to 2, until the data set does not misclassified points
(1) a perceptron hyperplane function
f(x)=sign(x*w+b)
其中sign当x大于等于0的时候为+1,小于0时为-1.
(2) a perceptron loss function
here is to select the point to the total distance misclassification hyperplane loss function optimized stochastic gradient descent, i.e., respectively loss function w, b derivative, which gradient is obtained, and the parameter update until the minimum loss function (non-negative)
Two, sklearn realize (achieve linearly separable sets of data, multi-classification)
The default generate samples at sklearn dichotomous function make_classification, main parameters
#n_samples: generating a number of samples
# n_features = 2: generate the sample wherein
from sklearn.datasets import make_classification
from sklearn.linear_model import Perceptron
from sklearn.metrics import accuracy_score
from sklearn.metrics import classification_report
x,y = make_classification(n_samples=1200, n_features=2,n_redundant=0,n_informative=1,n_clusters_per_class=1)
x_data_train = x[:800,:]##x为特征
x_data_test = x[800:,:]
y_data_train = y[:800]##y为类别
y_data_test = y[800:]
#定义感知机
clf = Perceptron(fit_intercept=False,n_iter=50,shuffle=False,eta0=0.1,random_state=0)
#使用训练数据进行训练
clf.fit(x_data_train,y_data_train)
print(clf.coef_)#返回超平面参数w,b
y_pred=clf.predict(x_data_test)
#评估模型,采用score和classification_report,accuracy_score方法
acc = clf.score(x_data_test,y_data_test)
print(acc)
print (accuracy_score(y_data_test, y_pred))
classify_report = classification_report(y_data_test, y_pred)
print('classify_report : \n', classify_report)
The main parameters Perceptron model
n_iter: gradient descent can be understood as the number of iterations
tol: float or None, if the float, in iteration (loss> previous_loss - tol) stopped.
eta0: float, learning rate, (0,1)
perceptron return parameter
coef_: the weights [W1, w2 of, ...]
intercept_: constant b
Remarks
1 perceptron hyperplane function and the selected initial value, the order of selecting the data points about
2 perceptron present dual form
3, in the case of the data set linearly inseparable, it is necessary to constrain the hyperplane, employed SVM (support vector machine) to divide the hyperplane