[Perceptron] The original form of the perceptron learning algorithm

The perceptron is a linear classification model for two-class classification. Its input is the feature vector of the instance, and the output is the category of the instance, taking binary values ​​of +1 and -1. The perceptron corresponds to a separation hyperplane that divides instances into positive and negative categories in the input space (feature space), and is a discriminant model. Perceptron is the basis of neural network and support vector machine

Perceptron learning aims to find the separation hyperplane that linearly divides the training data.

Perceptron learning ideas:

1. Import the loss function based on misclassification

2. Use gradient descent method to minimize the loss function

3. Substitute the parameters to get the perceptron model.

Classification of perceptron learning algorithms:

Primitive form, dual form.

 Algorithm: The original form of the perceptron learning algorithm

Input: training data set $T=\{(x_1,y_1),(x_2,y_2),\cdots,(x_N,y_N)\}$, where x_i\in\mathcal{X}=\mathbb{R}^n,y_i\in\mathcal{Y}=\{-1,+1\}, \:i=1,2,\cdots,N;learning rate $\and\left(0<\and\leqslant1\right)$;

Output: w,b;Perceptron modelf(x)=\mathrm{sign}(w\bullet x+b)

1) Select the initial value w_0,b_0;

2) Select data in the training set (x_i,y_i);

3) If y_i(w\cdot x_i+b)\leq 0,

\begin{aligned}w&\leftarrow w+\eta y_ix_i\\b&\leftarrow b+\eta y_i\end{aligned}

4) Go to 2) until there are no misclassification points in the training set.

The algorithm uses stochastic gradient descent:

First, arbitrarily select a hyperplane, and then use the gradient descent method to minimize the loss function. The minimization process is not to reduce the gradient of all misclassified points at once, but to randomly select one misclassified point each time to reduce the gradient.

Assuming that the set of misclassified points M is fixed, L(w,b)=- \underset{x_i\in M}{\sum } y_i (w\cdot x_i+b )the gradient of the loss function is:

\bigtriangledown_wL(w,b)=-\underset{x_1\in M}{\sum}y_ix_i

\bigtriangledown_bL(w,b)=-\underset{x_1\in M}{\sum}y_i

For a certain point (x_i,y_i), wthe gradient -y_ix_iis in the increasing direction, so w\leftarrow w+\eta y_ix_ithe loss function decreases.

Algorithm understanding : When an instance point is misclassified, the values ​​of w and b are adjusted to move the classification hyperplane to one side of the misclassified point, reducing the distance between the misclassified point and the hyperplane until the hyperplane crosses the misclassify points so that they are correctly classified 

Guess you like

Origin blog.csdn.net/weixin_73404807/article/details/135365074