Detailed explanation of neural network perceptron

In this article, we mainly understand the definition and function of the perceptron, how the model obtains the loss function, and what methods are there to minimize the loss function to determine the model parameters

Definition of the perceptron model :

The input space is composed of xi(xi(1),xi(2),xi(3)...xi(n)), the output space is {-1, +1}, and the mapping function from the input space to the output space is f( x)=sign(w·x+b) is called a perceptron. Where w is the weight vector, b is called the bias, and w·x is the inner product of w and x. sign(x) is a sign function, that is:
0
! [insert picture description here](https://img-blog.csdnimg.cn/20210114151600985.png

The function of the perceptron model

The perceptron is equivalent to a two-class model. The input is the feature vector of the sample, and the output is the category of the sample, taking +1 and -1. So we have to get a correct model, the perceptron often requires the data set itself to be linearly separable.
On a two-dimensional plane, linearly separable means that positive and negative samples can be separated by a straight line;
in three-dimensional space, linearly separable means that positive and negative samples can be separated by a plane;
in n-dimensional space, linear Separable means that the positive and negative samples can be separated by an n-1 dimensional hyperplane. Insert picture description here
In order to facilitate calculations, we tend to make linearly inseparable samples linearly separable under some transformation. If we can't find a straight line to divide the positive and negative samples, then we can divide it by two straight lines. If the two are satisfied, we say it is a positive sample, and the others are negative samples. There is another way of dividing. In the industrial world, people often find a curve to separate them, but the question is how to do this curve? This is the question we want to think about. In fact, the reason is very simple. We first make some linear classifiers, and then we are doing the superposition of linear classifiers to form a jagged line instead of a smooth line. In general, that is to say, through multiple linear classifiers, one by one combined to complete the nonlinear segmentation.
Insert picture description here

The perceptron model diagram is as follows

Insert picture description here
It can be seen from the model that we obviously need to solve w and b, that is to say, only in this way can we correctly separate the hyperplane S of all positive and negative samples. Then how to determine w and b requires a loss function, and Minimize the loss function. The method we usually use is the gradient descent method to find the optimal value. Of course, we will introduce better methods than the gradient descent method, such as Momentum, AdaGrad, and Adam . The following is an introduction to these methods (it is quite concise)
https://blog.csdn.net/m0_51004308/article/details/112614340

Loss function

I think this blog is very well written. Here is a reference to this blogger's writing. Below is his link .

The total distance from the misclassified point to the hyperplane S is selected as the loss function.
First, find the distance from a misclassified point to the hyperplane
because the distance from any point x0 in the input space xi(xi(1),xi(2),xi(3)...xi(n)) to the hyperplane S:
Insert picture description here
Insert picture description here

Guess you like

Origin blog.csdn.net/m0_51004308/article/details/112611141
Recommended