[Neural Network] Neural Network Foundation (2)

[Continued from the previous article: https://blog.csdn.net/Aibiabcheng/article/details/106911064 ]

5. Sometimes a classifier is not enough to solve the problem

The previous examples were successfully classified by a linear classifier. Here, we will use a simple and clear example to illustrate the limitations of linear classifiers.

[Logical operation: exclusive or XOR ]

Boolean logic functions usually require two inputs and output an answer.

Computers usually use the number 1 to represent true and the number 0 to represent false.

Type A Type B XOR
0 0 0
0 1 1
1 0 1
1 1 0

Now, look at the graph of this function, where the output on the grid node has been colored.

It seems that we cannot separate the red area from the blue area with a single straight line .

In fact, for the Boolean XOR function, it is impossible to use a single straight line to successfully separate the red and blue regions. In other words, if there is training data dominated by the XOR function, then a simple linear classifier cannot learn the Boolean XOR function.

We have shown a major limitation of simple linear classifiers. Therefore, we need a solution. Fortunately, the solution is easy. The figure below uses two straight lines to divide different areas. This implies a solution, that is,

We can use multiple classifiers to work together. This is the core idea of ​​neural networks.

As you can imagine, multiple straight lines can separate regions with abnormal shapes and classify each region.

Let's summarize briefly at the end.

  • If the data itself is not dominated by a single linear process, then a simple linear classifier cannot divide the data. For example, the data governed by the logical XOR operator illustrates this point.
  • But the solution is easy, you only need to use multiple linear classifiers to divide the data that cannot be separated by a single straight line.

 

6. Neurons-Nature's Computing Machine

Although some computers have a large number of electronic computing components, huge storage space, and the operating frequency of these computers is
much faster than that of fleshy, soft biological brains , even a brain as small as a pigeon has far greater capabilities. Because of these electronic computers, this makes scientists puzzled by animal brains.

Traditional computers process data fairly accurately and specifically according to a strict serial sequence. For these cold and hard computers, there is no ambiguity or uncertainty. On the other hand, animal brains seem to run at a much slower rhythm on the surface, but they seem to process signals in parallel, and ambiguity is a characteristic of its calculations.

Let us observe the basic unit in the biological brain- neurons .

Although neurons come in various forms, all neurons transmit electrical signals from one end to the other, along the axon, and transmit electrical signals from dendrites to dendrites. Then, these signals are transmitted from one neuron to another neuron.

Biological brains are much slower and have relatively fewer computing elements than modern computers. But why are biological brains so capable? The entire function of the brain (such as consciousness) is still a mystery, but regarding neurons that can calculate in different ways, that is, different ways of solving problems, human knowledge is enough for us to use.

So let's take a look at how a neuron works. It receives an electrical input and outputs another electrical signal. This seems to be exactly the same as the classification or prediction machines we have previously observed. These machines also accept an input, perform some processing, and then pop up an output.

So, can we represent neurons as linear functions as before? Although this is a good idea, it cannot be done . Biological neurons are different from simple linear functions. They cannot simply respond to inputs and generate output . In other words, its output cannot take this form: output = (constant * input) + (maybe another constant).

Observations have shown that neurons do not respond immediately, but instead inhibit input until the input increases and is strong enough to trigger output. You can think of it this way, the input must reach a threshold before the output can be produced. It's like water in a cup-it won't overflow until the cup is full.

                

Although this function accepts an input signal and produces an output signal, we need to take into account a certain threshold called the activation function. Mathematically, there are many activation functions that can achieve this effect. A simple step function can achieve this effect.

You can see that the output is zero when the input value is small. However, once the input reaches the threshold, the output jumps up.

We can improve the step function. The sigmoid function shown in the figure below is called the sigmoid function . This function is relatively smoother than the cold, hard step function, which makes this function more natural and closer to reality.

S function, sometimes called logistic function:  y = \ frac {1} {1 + e ^ {- x}}

Let's go back to neurons and think about how we model artificial nerves.

Dendrites collect electrical signals and combine them to form stronger electrical signals. If the signal is strong enough to exceed the threshold, the neuron will emit a signal along the axon and reach the terminal, transmitting the signal to the dendrites of the next neuron. The figure below shows several neurons connected in this way.

One thing to note is that each neuron receives input from multiple neurons before it, and if the neuron is activated, it also provides signals to more neurons at the same time. One way to copy this natural form to an artificial model is to build multiple layers of neurons, each of which is connected to the neurons in the previous and subsequent layers. The figure below details this idea.

You can see three layers of neurons, each of which has three artificial neurons or nodes. You can also see that each node is connected to every other node in the previous or subsequent layers.

But, which part of this seemingly cool architecture can perform the learning function? How should we adjust and respond to training samples? Are there parameters similar to the slope in the previous linear classifier for us to adjust?

The most obvious point is to adjust the connection strength between nodes . Within a node, we can adjust the input sum  or  the shape of the S threshold function ,

However, it is more complicated to adjust the shape of the S threshold function than simply adjusting the connection strength between nodes .

If the relatively simple method works, then stick to this method! The figure below shows the connected nodes again, but this time shows the associated weights on each connection . A smaller weight will weaken the signal, and a larger weight will amplify the signal.

Here, I need to explain the interesting little numbers (ie subscripts) next to the weight symbols. Simply put, the weight is w_{2,3}associated with the signal that the node 2 of the previous layer transmits to the node 3 of the next layer. Therefore, the weight w_{1,2}decreases or amplifies the signal from node 1 to node 2 in the next layer. To illustrate this idea in detail, the figure below highlights the two connections between the first and second layers.

You may have good reasons to challenge this design, asking why each neuron in the front and back layers must be connected to the neurons in all other layers, and you can even come up with various creative ways to connect these neurons. We don’t use creative ways to connect neurons for two reasons.

  • The first is that this consistent form of fully connected can actually be coded into computer instructions relatively easily.
  • The second is that the learning process of the neural network will weaken these connections that are actually not needed (that is, the weights of these connections will approach 0), so the minimum number of connections required to solve a specific task will be redundant. No harm is done.

Speaking of this, what do we mean?

This means that as the neural network learning process progresses, the neural network improves the output by adjusting and optimizing the link weights within the network. Some weights may become zero or close to zero. A weight of zero or almost zero means that the contribution of these links to the network is zero, because no signal is transmitted. Zero weight means that the signal is multiplied by zero, and the result is zero, so the link is actually broken.

Finally, we need to summarize and review.

  • Although biological brains seem to have much less storage space and run slower than modern computers, biological brains can perform complex tasks such as flying, finding food, learning languages, and evading natural enemies.
  • Compared to traditional computer systems, biological brains are incredibly resilient to damaged and imperfect signals.
  • A biological brain composed of interconnected neurons is the inspiration for artificial neural networks. We have thus established a simple neural network structure.

[Reference]: Python Neural Network Programming ~ [English] Tariq Rashid (Tariq Rashid), translated by Lin Ci.

Guess you like

Origin blog.csdn.net/Aibiabcheng/article/details/106990274