Personal blog navigation page (click on the right link to open a personal blog): Daniel take you on technology stack
Geometric principles of artificial neural networks
(Geometric principle of Artificial Neural Networks)
This article examines the artificial neural network composed of only RMB ordinary neural networks (CNN and non RNN), and only discussed the single (hidden) layer classification this is a classic scene easiest ReLU nerves.
The basic agreement
In order to facilitate discussion and an image of the activation function are used throughout ReLU, the original two-dimensional vector X is input.
Example 1
The figure below shows a simple neural network, the input layer comprising two nodes, two nodes of the output layer, the hidden layer and three nodes. The network can be used to solve binary classification problem for the two-dimensional input vector, the probability of two classifications of its output.
- Input layer - 2-dimensional vector X
- Hidden layer (first layer) - ReLU layer (3 neurons)
- An output layer (second layer) - Softmax layer (2 neurons, binary classification)
The following figure shows the sample distribution is assumed, there are two features of each sample (abscissa values X0, whose ordinate values X1) belonging to one of red, green, and two classifications (color). Sample real dividing line is a circle.
In the case where the below shows the network after learning (learning process is omitted) for optimal results, that the neural network in the gray area of the sample is red, the sample outside the region of the gray green, classification identification accuracy was 95%. (Click TensorPlayground experience the learning process)
- Next three ReLU neurons in this distribution is optimal boundary graphic hexagon
Why can such a simple neural network, to achieve this effect it (the hexagon boundaries graphics)? The following article from the geometric point of view to elaborate its internal principle, so that the artificial neural network is no longer a black box.
Single neuron ReLU
- W, X are vectors
- Bias Bias parameter for the input neuron X, W weights for neurons weight Weight parameter, b for neurons
Here, let W, X are the 2-dimensional vector, and let W = [1.5, 3.5], b = -2.5, which is an image as follows:
- (Note that the map scale and X0 Z-axis, X1 axis inconsistent)
ReLU single neuron whose input is n-dimensional space X under the n + 1 of the high-dimensional space (to make the newly added dimension Z) generates a hyperplane, the hyperplane is then folded along the Z = 0, called ultra-fold surface
Ultra-off angle of surface
W is determined by the parameters
(High-dimensional space)
- Always obtuse angle
Super iron surface fold line folded position of Z = 0 on the hyperplane
W is determined by the parameters and parameter b
(High-dimensional space)
Constant * Super wrinkly
C * Z
Z-axis direction in the drawing and shrinking, inversion, will change the angle of the folded surface, it does not change the position of the fold line
- 1 <C ➡️ stretching, folding surface angle becomes smaller (steeper)
- 0 <C <1 ➡️ shrinkage, surface angle increases off (flattened)
- C <0 ➡️ inverted, flipped off the face
C=2
C=0.6
C=-1
C=-2
C=-0.6
Super Ultra folded surface folded surfaces +
Z0 + Z1
The first fold line in accordance with a second folded surface folded surface folded again, and two folded surfaces such that the original angle becomes smaller (steeper)
It does not change over at the fold line folded surfaces super position of Z = 0 on the hyperplane, but with the folded portion of the fold line, out of the original hyperplane of Z = 0
=
The first layer of neurons ReLU
Adding a plurality of linear ultra-off surface (after one viewing angle)
- Hn for the n-th result after ReLU operation of the first layer neurons
n neurons ➡️ generated in the fold line over the n hyperplane Z = 0, and folded in a high-dimensional space
- Super fold line determined by the position of only a single neuron of the first layer, and the layer of independent parameters
- After W layer parameter determining the folding angle relative to the surface of each super-off
- After one layer b parameter determines the position of the entire surface of the composite super off along the Z axis (vertical movement)
Straight dividing plane
n up to the planar straight lines is divided into sections
d dividing the n-dimensional space hyperplanes
n Hyperplanes up to d-dimensional space is divided into partial f (d, n)
Softmax under the binary classification
SoftMax (X) converts the value of the probability of vector X independently each index position of the event (0-1)
For binary classification Softmax terms, the network is actually the result of the previous layer two sets of linear addition, the resulting value is set larger as the prediction result
Here to do some transformation, with R1 - R0 and 0 size, out of the two alternative direct size comparison, the last layer of Softmax Simple addition of a set of linear and determine the size 0
- Z <0, the prediction is classified 0
- Z> 0, the prediction of a classification
Softmax in multivariate classification
For multivariate classification Softmax terms, the network is actually the result of the previous layer group multiple linear addition, the maximum value of the result is set as a prediction result
Here to do some transformation, with Ra - Rb and 0 size, out of the two alternative direct size comparison, this is equivalent to the last layer using Softmax a set of linear after the addition and more in line 0 is determined to determine the size a, b classification which a class, you can find a judge many times maximum likelihood classification
- Can still use the binary classifier, a linear combination Z = geometric perspective projection hyperplane 0
1 ReLU layer + 1 Softmax binary classification network layer
- Input X in n-dimensional space
- Z is n 1 + dimensional space is generated on the hyperplane = 0 and m ultra folding lines, the folding is m (m is the number of neurons in the first layer ReLU)
- After graphics folded linear additively combining (changing the angle and position of each fold of the entirety of the Z-axis) and Z = 0 to compare the size of the hyperplane
- In the n + 1-dimensional space, in which pattern the projection hyperplane Z = 0 is the original n-dimensional space binary classification boundary
See, for any finite meet certain distribution law, as long as a plurality ReLU sufficient numbers of neurons, the folding can depend on the high-dimensional space and generates high dimensional spatial distribution of such pattern matching
Parsing Example 1
High-dimensional spatial perspective
- 3 graphics super polylines produced under high-dimensional space
- Three straight lines dividing plane generator up to 7 parts, 6 parts produced here (the intermediate portion having the smallest negligible)
- Boundary pattern projected hyperplane at Z = 0 is just the hexagonal
ReLU first layer in the folding line Z = 0 super hyperplane
Z in high-dimensional space
Attached Java / C / C ++ / machine learning / Algorithms and Data Structures / front-end / Android / Python / programmer reading / single books books Daquan:
(Click on the right to open there in the dry personal blog): Technical dry Flowering
===== >> ① [Java Daniel take you on the road to advanced] << ====
===== >> ② [+ acm algorithm data structure Daniel take you on the road to advanced] << ===
===== >> ③ [database Daniel take you on the road to advanced] << == ===
===== >> ④ [Daniel Web front-end to take you on the road to advanced] << ====
===== >> ⑤ [machine learning python and Daniel take you entry to the Advanced Road] << ====
===== >> ⑥ [architect Daniel take you on the road to advanced] << =====
===== >> ⑦ [C ++ Daniel advanced to take you on the road] << ====
===== >> ⑧ [ios Daniel take you on the road to advanced] << ====
=====> > ⑨ [Web security Daniel take you on the road to advanced] << =====
===== >> ⑩ [Linux operating system and Daniel take you on the road to advanced] << = ====There is no unearned fruits, hope you young friends, friends want to learn techniques, overcoming all obstacles in the way of the road determined to tie into technology, understand the book, and then knock on the code, understand the principle, and go practice, will It will bring you life, your job, your future a dream.