Machine Learning and Neural Networks (2)

Machine Learning and Neural Networks

Introduction to Neural Network Architecture

Architecture: refers to the method of connecting neurons together.
The most common types of architecture are: 1. Feedforward neural network; 2. Recurrent neural network; 3. Bidirectional dual network.
Feedforward neural network : Information starts from the input unit layer, passes in one direction, and passes through the hidden layer to the output layer (commonly used in practice).
insert image description here
If (hidden units (hidden layer)) more than one layer is called deep learning.

Recursive neural network : Information circulates in the recurrent neural network, which can memorize information for a long time, and can display various interesting oscillations (but some training is more difficult).
Recurrent neural networks are divided into structural recurrent neural networks and temporal recurrent neural networks. In a narrow sense, recurrent neural network usually refers to structural recurrent neural network, while temporal recurrent neural network is called recurrent neural network.
Recurrent Network : . It is more powerful than feedforward neural network and can remember hidden information for a long time. Moreover, the cyclic neural network contains very complex parameter dynamic changes, which makes it difficult to train.
insert image description here
The graph structure of the recurrent neural network contains certain directed loops (if you start with a node or neuron, and then proceed according to the arrow, you may return to that neuron or node).
Recurrent networks are a great way to model time-series data. So we have connections between hidden units that behave like networks addicted to time. Therefore, at each moment, the state of the hidden unit at that time determines the state of the hidden unit at the next moment.
One aspect that distinguishes it from feed-forward neural networks is that recurrent neural networks use the same weights at each momentinsert image description here

Bidirectional dual network : In this network, the weights are the same in both directions in both units. It is easier to add constraints on specific tasks, obeying an energy function.

sensor

Perceptron is a typical structure in artificial neural network. Its main features are simple structure, there is a convergence algorithm for the problems it can solve, and it can be proved strictly mathematically, so it plays an important role in the research of neural network. impetus.
The decision unit in a perceptron is a binary threshold neuron.
insert image description here

Geometry Space Analysis of Perceptrons

The training sample can be represented as a plane, and the learning process covers finding the weight vectors corresponding to all sample planes. The points in the space correspond to the weight vectors, the training cases correspond to the space, and the special learning cases and weights depend on an edge of the hyperplane.
Weighted Space Image :
insert image description here
This training example defines a space in a 2D image that is simply a black line passing through the origin and perpendicular to the input vector (blue). The correct answer for this case is 1. In this training case, the weight vector needs to be on the correct side of the hyperplane, and in order to get the correct answer, it needs to be on the same side of the hyperplane that the training model is pointing at. Like the green weighted vector, the angle between it and the input vector is less than 90°, so the inner product of the input vector and the weighted vector is a positive number. Since the threshold has been subtracted, it can indicate that this is a positive sample (also means the probability value of the output result). If the angle between the weighted vector and the input vector on the wrong side of the plane will be greater than 90° like the red weighted vector, the inner product of the weighted vector and the input vector will be negative, which is a negative example, and the perceptron will be Show false or show 0.
All weighting vectors on one side of the plane will give correct results. On the other side of the plane, all weighted vectors will get the wrong answer.
insert image description here
In this case, the correct answer is 0. There is also a weighting space where the input case is applied to the plane indicated by the black line at one time. In this case, any weight vector forms an angle less than 90° with the input vector. And give us a positive rung result, so the sensor will show correct or 1, he will get the wrong result, conversely, any input vector on the other side of the plane will form an angle greater than 90°, and the answer is will be correct and be 0. Since the plane passing through the origin is perpendicular to the input vector, all weighted vectors are bad on one side of the plane and correct on the other side.
insert image description here
Putting the two cases together, in the graph, there is a cone plane on any possible weight vector, and any weight vector inside the cone plane will give the correct answer in both cases, but there will be no such The code appears, nor would such a weighting vector yield the correct answer in all practice cases. If a vector is obtained that is valid for all practice cases, then this vector is in the plane of the cone, and if another weighted vector is also in the plane of the cone, then the average of the two weighted vectors is also in the plane of the cone.

Dialysis of Perceptron's Principle

insert image description here
The requirement we want in this case is that every time a sensor makes a mistake, the weighted vector of our element will be closer to all feasible vectors.
insert image description here
Here, we define an all-around feasible weighting vector, which not only does not make mistakes in all practice cases, but also can get the correct answer by at least a certain distance. This pitch length, in this exercise case, is the same as the input vector length, so we take the cone of feasible solutions, and inside that we have another cone of all-feasible solutions, which corrects all errors at least as far as The input vectors have the same length. This way, every time the perceptron makes a mistake, the squared distance of all feasible weight vectors will be reduced by at least the squared length of the update vector, which is our update.
Thus, each time a sensor goes wrong, the element weighting vector is shifted from every all-powerful feasible vector, shrinking its square distance, and this is done by the square length of the minimum current input vector. So therefore, the bisector distances of all almighty feasible weight vectors are reduced by square length.

Perceptron limitations.

The limitations of the perceptron come from the various features used. Using the wrong features will limit the operation of the perceptron learning features.
Learn the right features: Emphasize the difficult points of learning.
insert image description here
We need the green case to output 1 on one side of the weight space, and the red case to output 0 on one side of the weight space.

Guess you like

Origin blog.csdn.net/weixin_51961968/article/details/113383258