【DL】Perceptron

1. Perceptron definition

A perceptron is a linear classification model whose output is binary, such as 1 and 0 (or -1)

insert image description here

2. Perceptron learning algorithm

Here we explore the original form of the perceptron learning algorithm, another form of which is the dual form.

The perceptron actually inputs a hyperplane that divides instances into two categories. Since the output and output are known, we can convert it into the optimal solution of the loss function, that is, to find the optimal w and b , the establishment process of the mathematical model is as follows:

Given a training dataset

insert image description here

Feasible domain D is

insert image description here

Build the objective function:

insert image description here

where M is the set of misclassified points.

Then, we start the optimization of the objective function, first initialize w 0 and b 0 , the descent algorithm chooses the stochastic gradient descent method, and the gradient of the loss function is:

insert image description here

According to the gradient formula, we update the model parameters every time we sample a misclassified point:

insert image description here

The intuitive explanation of the above update process is: when an instance point is misclassified, that is, it is located on the wrong side of the separating hyperplane, then adjust the values ​​of w and b so that the separating hyperplane moves to the side of the misclassified point, so that Reduce the distance between the misclassification point and the hyperplane until the hyperplane passes the misclassification point to make it classified correctly.

3. Training process

Below is the pseudocode of the perceptron training process.

First initialize the weight w and bias b to 0, and then start to iteratively approximate w and b. where y i is the true output, and the inner product of weights and input vectors plus bias is the predicted value. Since the output is binary, that is, 1 or 0, if the product of the predicted value and the real value is less than or equal to 0, it means that the predicted and real outputs are not of the same category, and w and b are updated at this time. If the product is greater than 0, the classification is successful and proceed to the next iteration.
insert image description here

The training process of the perceptron can also be seen as a gradient descent using a batch size of 1, and the loss function is shown in the figure above. If the product of the actual value and the predicted value is less than or equal to 0, that is, the classification is not accurate, then update the model parameters.

4. Convergence theorem

In the classification process, there is actually a margin ρ, as long as the objective function is within this margin, the classification is correct. In this case, the perceptron will definitely converge. r represents the size of the data, the larger the data, the slower the convergence. ρ represents the quality of the data. If the two types of data are not well separated, the convergence will be very slow.

insert image description here

5. Summary

  • Perceptron is a binary classification model, one of the earliest AI models
  • The solution algorithm is equivalent to using gradient descent with a batch size of 1
  • Cannot fit the XOR function until the introduction of multi-layer perceptron

6. References

https://zhuanlan.zhihu.com/p/30155870
https://www.bilibili.com/video/BV1hh411U7
https://zh-v2.d2l.ai/chapter_multilayer-perceptrons/mlp.html

Guess you like

Origin blog.csdn.net/qq_43557907/article/details/126943951