What is a perceptron - with pictures and texts, from shallow to deep

What is a perceptron - with pictures and texts, from shallow to deep

introduction

Life is often accompanied by various logical judgments, such as seeing dark clouds floating in the sky in the distance, turning on the mobile phone and seeing the weather forecast saying that there is a 40% probability of rain in 1 hour, at this time we often make wait It is raining, the judgment of going out with an umbrella.

The above thinking process can be abstracted as a "neural logic" of "and". When "seeing dark clouds" and "weather forecast 40% rain" are satisfied at the same time, we will make a judgment of "it will rain later, take an umbrella when going out"; if we only see "dark clouds", but the weather forecast says If there is a 0% probability of rain, or no dark clouds are seen, and the weather forecast has a 40% probability of rain, we will make a judgment that it will not rain later.

Now let's abstract the process just now.

image-20230419183259978

There are two inputs to the logic: the sight of dark clouds and the weather forecast. After logical judgment, decide whether to bring an umbrella.

When judging, some people will believe more in the dark clouds they see with the naked eye, and the weather forecast is only an aid. After all, weather forecasts are often inaccurate; It may not rain. Therefore, we introduce the concept of "trust degree". I trust the weather forecast more. If I fully believe it is 1 (it can also be 2, 3, 3.4, etc.), then I give "weather forecast" a trust level of 0.7 and "see dark clouds" a trust level of 0.3.

image-20230419184253390

I believe readers must have a certain concept, but the current general "judgment" and "trust" cannot actually solve the problem. Mainly have the following questions:

  • How does "trust" affect judgment?
  • How is "judgement" done?
  • What is the relationship between the judgment I mentioned and the perceptron?

Next, we will mathematicalize the problem to determine the logical relationship quantitatively. This process is usually called "mathematical modeling".

There are many judgments in life, some are as simple as whether to carry an umbrella or not, and some are as complicated as answering a high-level math problem. These judgments are all based on the signals input into our brains from many external environments, and are obtained through the complex neural network in the brain.

The above example is a very simple judgment. It is conceivable that if this simple judgment is combined in thousands, it can form a very large and complex "neural network", which can handle things. The unit that makes up this neural network is called a "neuron". Roughly speaking, we can also call it a "perceptron".

The introduction of the perceptron

baby version

Use x 1 x_1 respectivelyx1Means "seeing dark clouds", x 2 x_2x2Means "weather forecast". If dark clouds are seen, then x 1 = 1 x_1=1x1=1 elsex 1 = 0 x_1=0x1=0 ; similarly, if the weather forecast says there is a 40% chance of rain,x 2 = 1 x_2=1x2=1 , if it is raining with 0% probability,x 2 = 0 x_2=0x2=0

yyy means "whether to bring an umbrella", when wearing an umbrella,y = 1 y = 1y=1 , otherwisey = 0 y=0y=0

image-20230419190651723

For the convenience of description, we refer to 0.3 and 0.7 trust degrees as weighted here. Now we introduce the "weighted sum" aaa . He is a weighted sum of the input variables.
a = 0.3 x 1 + 0.7 x 2 a=0.3x_1 + 0.7x_2a=0.3x _1+0.7x _2
x 1 , x 2 x_1,x_2 x1,x2There are only 4 possible values ​​for , we might as well list them out, let’s call the table a truth table: (Here, we think that we only take umbrellas when we see dark clouds and weather forecasts at the same time)

x 1 x_1 x1 x 2 x_2 x2 a a a yyy
0 0 0 0
1 0 0.3 0
0 1 0.7 0
1 1 1 1

The observation table can specify the following rules for judgment: when aaa is greater than a certain valueθ \thetaθ,时, y y y takes 1, that is, bring an umbrella.
y = { 0 0.3 x 1 + 0.7 x 2 ≤ θ 1 0.3 x 1 + 0.7 x 2 > θ y= \left\{\begin{matrix} 0\quad 0.3x_1 + 0.7x_2\le \theta \\ 1\ quad 0.3x_1 + 0.7x_2> \theta \end{matrix}\right.y={ 00.3x _1+0.7x _2i10.3x _1+0.7x _2>i
By observing the truth table, we can get θ \theta very easilyThe value of θ can be 0.8 or 0.9, and the value range is as follows:
θ > 0.7 \theta>0.7i>0.7
So far, we have mathematically modeled the "judgment" success, as follows:

image-20230419192855334

So far, a complete and practical perceptron has been built. The circle in the middle of the figure above is the perceptron, which has two input signals and one output signal. I named it "Baby Perceptron".

Youth Edition

The growth of the baby's perceptron

The judgment logic of the baby perceptron is:
y = { 0 0.3 x 1 + 0.7 x 2 ≤ θ 1 0.3 x 1 + 0.7 x 2 > θ y= \left\{\begin{matrix} 0\quad 0.3x_1 + 0.7x_2 \le \theta \\ 1\quad 0.3x_1 + 0.7x_2> \theta \end{matrix}\right.y={ 00.3x _1+0.7x _2i10.3x _1+0.7x _2>i
Now we simply transform it:
y = { 0 0.3 x 1 + 0.7 x 2 + ( − θ ) ≤ 0 1 0.3 x 1 + 0.7 x 2 + ( − θ ) > 0 y= \left\{\begin{ matrix} 0\quad 0.3x_1 + 0.7x_2+(-\theta)\le0 \\ 1\quad 0.3x_1 + 0.7x_2+(-\theta)> 0 \end{matrix}\right.y={ 00.3x _1+0.7x _2+( θ )010.3x _1+0.7x _2+( θ )>0
We set b = ( − θ ) b=(-\theta)b=( θ ) , the formula can be further simplified:
y = { 0 0.3 x 1 + 0.7 x 2 + b ≤ 0 1 0.3 x 1 + 0.7 x 2 + b > 0 y= \left\{\begin{matrix} 0 \quad 0.3x_1 + 0.7x_2+b\le0 \\ 1\quad 0.3x_1 + 0.7x_2+b> 0 \end{matrix}\right.y={ 00.3x _1+0.7x _2+b010.3x _1+0.7x _2+b>0
Next, the schematic diagram of the perceptron is as follows:

image-20230419194256640

Remember when we introduced the baby perceptron, the weighted sum variable aa introduceda ,
a = 0.3 x 1 + 0.7 x 2 a=0.3x_1 + 0.7x_2a=0.3x _1+0.7x _2
At this point, we might as well use its concept again to clarify the concept of the perceptron. We make a a new expression as follows:
a = 0.3 x 1 + 0.7 x 2 + ba=0.3x_1 + 0.7x_2+ba=0.3x _1+0.7x _2+bThe
expression of logical judgment can be simplified as:
h ( a ) = y = { 0 a ≤ 0 1 a > 0 h(a)=y= \left\{\begin{matrix} 0\quad a\le0 \\ 1 \quad a> 0 \end{matrix}\right.h(a)=y={ 0a01a>0

Here we use the symbol hhh instead offff , only the sign is changed, the meaning is not changed.

Drawing a simplified schematic diagram of a perceptron will make the concept clearer:

image-20230419201103199

So far, a complete and practical perceptron has been built, and the summation and h ( a ) h(a) in the above figureh ( a ) together form a perceptron, which has three input signals and one output signal. I named him "Youth Perceptron".

image-20230419201458220

Old man chatting with teenagers crazy version

Let's start abstracting the concept.

In the field of deep learning, we have special terms for the "trust degree", input signal, output signal, etc. mentioned above, and now we will introduce them. First, express the previously discussed "trust degree" 0.3 and 0.7 as: w 1 , w 2 w_1, w_2w1w2

image-20230419202201795

To summarize the above discussion:
h ( a ) = { 0 a ≤ 0 1 a > 0 h(a)= \left\{\begin{matrix} 0\quad a\le0 \\ 1\quad a> 0 \end {matrix}\right.h(a)={ 0a01a>0

a = w 1 x 1 + w 2 x 2 + ba=w_1x_1 + w_2x_2+ba=w1x1+w2x2+b

variable name
w w w Weights The higher the weight, the greater the role of the signal in the judgment
b b b bias Make adjustments based on judgment results
h ( a ) h(a) h(a) activation function There are several options, the example used here is the step function

At this point, the complete perceptron is presented to everyone, and the concepts and professional names have been clarified.

In general, we express the perceptron as follows:

image-20230419203330446

activation function

There are many options for the activation function. The above example uses a step function, and the expression is as follows:
h ( a ) = { 0 a ≤ 0 1 a > 0 h(a)= \left\{\begin{matrix} 0\quad a\le0 \\ 1\quad a> 0 \end{matrix}\right.h(a)={ 0a01a>0
image-20230419203752372

In addition, there are sigmoid and ReLu to choose from. Taking sigmoid as an example, the expression is:
h ( x ) = 1 1 + exp ⁡ ( − x ) h(x)=\frac{1}{1+\operatorname{exp}(-x)}h(x)=1+exp(x)1
image-20230419204158994

Just getting started with deep learning, you can ignore why you need to introduce different activation functions, what is the function of the sigmoid function, why it looks so strange, what are the advantages over the step function, and you will experience the activation function in the following studies role. Just need to know that there are many options for the activation function here.

Perceptron Applications

AND gate

The truth table is as follows:

x 1 x_1 x1 x 2 x_2 x2 yyy
0 0 0
1 0 0
0 1 0
1 1 1

We are now facing the picture of the perceptron to confirm the value.

image-20230419202201795

Weight values ​​and biases w 1 , w 2 , b w_1,w_2,bw1,w2,b has a variety of value methods, such as: 0.5, 0.5, -0.7.

x 1 x_1 x1 x 2 x_2 x2 a a a yyy
0 0 0 0
1 0 -0.2 0
0 1 -0.2 0
1 1 0.3 1

image-20230419204912912

It can also be taken separately: 1, 1, -1.3.

x 1 x_1x1 x 2 x_2x2 a aa yyy
0 0 0 0
1 0 -0.3 0
0 1 -0.3 0
1 1 0.7 1

OR gate

The truth table is as follows:

x 1 x_1x1 x 2 x_2x2 yyy
0 0 0
1 0 1
0 1 1
1 1 1

Weight values ​​and biases w 1 , w 2 , b w_1,w_2,bw1,w2,b has a variety of value methods, such as: 0.5, 0.5, -0.2.

x 1 x_1x1 x 2 x_2x2 a aa yyy
0 0 0 0
1 0 0.3 1
0 1 0.3 1
1 1 0.7 1

Perceptron and Deep Learning

Perceptrons and Neural Networks

A single perceptron can only achieve simple functions, but when tens of thousands or hundreds of millions of perceptrons cooperate, the problems that can be solved are not so simple. We call the whole of the perceptron a "neural network"

What is the relationship between perceptron and deep learning?

Taking the "AND gate" of the application part of the perceptron as an example, we manually determined the weight and bias for it according to the truth table of the AND gate, so that the perceptron realized the function of the AND gate. However, a network composed of tens of thousands of perceptrons will have countless weights and biases. At this time, it is no longer realistic to manually confirm their values. The deep learning algorithm can solve this problem. The deep learning will learn a possible weight and bias according to the truth table to realize the function of the AND gate, and no longer need people to confirm the parameters.

Guess you like

Origin blog.csdn.net/qq_34022877/article/details/130253377