Personal blog navigation page (click on the right link to open a personal blog): Daniel take you on technology stack

Geometric principles of artificial neural networks

(Geometric principle of Artificial Neural Networks)

This article examines the artificial neural network composed of only RMB ordinary neural networks (CNN and non RNN), and only discussed the single (hidden) layer classification this is a classic scene easiest ReLU nerves.

The basic agreement

In order to facilitate discussion and an image of the activation function are used throughout ReLU, the original two-dimensional vector X is input.

Example 1

The figure below shows a simple neural network, the input layer comprising two nodes, two nodes of the output layer, the hidden layer and three nodes. The network can be used to solve binary classification problem for the two-dimensional input vector, the probability of two classifications of its output.

The simplest of artificial neural networks

Input layer - 2-dimensional vector X
Hidden layer (first layer) - ReLU layer (3 neurons)
An output layer (second layer) - Softmax layer (2 neurons, binary classification)

The following figure shows the sample distribution is assumed, there are two features of each sample (abscissa values X0, whose ordinate values X1) belonging to one of red, green, and two classifications (color). Sample real dividing line is a circle.

Sample distribution

In the case where the below shows the network after learning (learning process is omitted) for optimal results, that the neural network in the gray area of the sample is red, the sample outside the region of the gray green, classification identification accuracy was 95%. (Click TensorPlayground experience the learning process)

The results of the neural network classifier

Next three ReLU neurons in this distribution is optimal boundary graphic hexagon

Why can such a simple neural network, to achieve this effect it (the hexagon boundaries graphics)? The following article from the geometric point of view to elaborate its internal principle, so that the artificial neural network is no longer a black box.

Single neuron ReLU

ReLU neurons

W, X are vectors
Bias Bias parameter for the input neuron X, W weights for neurons weight Weight parameter, b for neurons

Here, let W, X are the 2-dimensional vector, and let W = [1.5, 3.5], b = -2.5, which is an image as follows:

ReLU neuronal function image

(Note that the map scale and X0 Z-axis, X1 axis inconsistent)

ReLU single neuron whose input is n-dimensional space X under the n + 1 of the high-dimensional space (to make the newly added dimension Z) generates a hyperplane, the hyperplane is then folded along the Z = 0, called ultra-fold surface

Ultra-off angle of surface

W is determined by the parameters

Ultra-off angle of surface

(High-dimensional space)

Ultra-off angle of the surface - high dimensional space

Always obtuse angle

Super iron surface fold line folded position of Z = 0 on the hyperplane

W is determined by the parameters and parameter b

Super iron surface fold line folded position of Z = 0 on the hyperplane

(High-dimensional space)

Super fold line folded surfaces super super position on Z = 0 plane - high dimensional space

Constant * Super wrinkly

C * Z

Z-axis direction in the drawing and shrinking, inversion, will change the angle of the folded surface, it does not change the position of the fold line

1 <C ➡️ stretching, folding surface angle becomes smaller (steeper)
0 <C <1 ➡️ shrinkage, surface angle increases off (flattened)
C <0 ➡️ inverted, flipped off the face

C=2

* Ultra constant C = 2 off the surface

C=0.6

* Ultra folded surface constant C = 0.6

C=-1

* Ultra folded surface Constant C = -1

C=-2

* Ultra folded surface Constant C = -2

C=-0.6

* Ultra folded surface Constant C = -0.6

Super Ultra folded surface folded surfaces +

Z0 + Z1

The first fold line in accordance with a second folded surface folded surface folded again, and two folded surfaces such that the original angle becomes smaller (steeper)

It does not change over at the fold line folded surfaces super position of Z = 0 on the hyperplane, but with the folded portion of the fold line, out of the original hyperplane of Z = 0

Ultra folded surfaces Z0

Ultra folded surfaces Z1

Ultra folded surfaces Z0 + Z1

The first layer of neurons ReLU

Adding a plurality of linear ultra-off surface (after one viewing angle)

Ultra linear addition plurality of folded surfaces

Hn for the n-th result after ReLU operation of the first layer neurons

n neurons ➡️ generated in the fold line over the n hyperplane Z = 0, and folded in a high-dimensional space

Super fold line determined by the position of only a single neuron of the first layer, and the layer of independent parameters
After W layer parameter determining the folding angle relative to the surface of each super-off
After one layer b parameter determines the position of the entire surface of the composite super off along the Z axis (vertical movement)

Straight dividing plane

n up to the planar straight lines is divided into Straight dividing plane sections

d dividing the n-dimensional space hyperplanes

n Hyperplanes up to d-dimensional space is divided into partial f (d, n)

d dividing the n-dimensional space hyperplanes

Softmax under the binary classification

SoftMax (X) converts the value of the probability of vector X independently each index position of the event (0-1)

softmax(X)

For binary classification Softmax terms, the network is actually the result of the previous layer two sets of linear addition, the resulting value is set larger as the prediction result

Softmax binary classification

Here to do some transformation, with R1 - R0 and 0 size, out of the two alternative direct size comparison, the last layer of Softmax Simple addition of a set of linear and determine the size 0

R1 - R0

Z <0, the prediction is classified 0
Z> 0, the prediction of a classification

Softmax in multivariate classification

For multivariate classification Softmax terms, the network is actually the result of the previous layer group multiple linear addition, the maximum value of the result is set as a prediction result

Softmax of multivariate classification

Here to do some transformation, with Ra - Rb and 0 size, out of the two alternative direct size comparison, this is equivalent to the last layer using Softmax a set of linear after the addition and more in line 0 is determined to determine the size a, b classification which a class, you can find a judge many times maximum likelihood classification

Can still use the binary classifier, a linear combination Z = geometric perspective projection hyperplane 0

1 ReLU layer + 1 Softmax binary classification network layer

Input X in n-dimensional space
Z is n 1 + dimensional space is generated on the hyperplane = 0 and m ultra folding lines, the folding is m (m is the number of neurons in the first layer ReLU)
After graphics folded linear additively combining (changing the angle and position of each fold of the entirety of the Z-axis) and Z = 0 to compare the size of the hyperplane
In the n + 1-dimensional space, in which pattern the projection hyperplane Z = 0 is the original n-dimensional space binary classification boundary

See, for any finite meet certain distribution law, as long as a plurality ReLU sufficient numbers of neurons, the folding can depend on the high-dimensional space and generates high dimensional spatial distribution of such pattern matching

Parsing Example 1

High-dimensional spatial perspective

3 graphics super polylines produced under high-dimensional space
Three straight lines dividing plane generator up to 7 parts, 6 parts produced here (the intermediate portion having the smallest negligible)
Boundary pattern projected hyperplane at Z = 0 is just the hexagonal

ReLU first layer in the folding line Z = 0 super hyperplane

Z in high-dimensional space

Z in the high dimensional space projected = 0

Attached Java / C / C ++ / machine learning / Algorithms and Data Structures / front-end / Android / Python / programmer reading / single books books Daquan:

(Click on the right to open there in the dry personal blog): Technical dry Flowering
===== >> ① [Java Daniel take you on the road to advanced] << ====
===== >> ② [+ acm algorithm data structure Daniel take you on the road to advanced] << ===
===== >> ③ [database Daniel take you on the road to advanced] << == ===
===== >> ④ [Daniel Web front-end to take you on the road to advanced] << ====
===== >> ⑤ [machine learning python and Daniel take you entry to the Advanced Road] << ====
===== >> ⑥ [architect Daniel take you on the road to advanced] << =====
===== >> ⑦ [C ++ Daniel advanced to take you on the road] << ====
===== >> ⑧ [ios Daniel take you on the road to advanced] << ====
=====> > ⑨ [Web security Daniel take you on the road to advanced] << =====
===== >> ⑩ [Linux operating system and Daniel take you on the road to advanced] << = ====

There is no unearned fruits, hope you young friends, friends want to learn techniques, overcoming all obstacles in the way of the road determined to tie into technology, understand the book, and then knock on the code, understand the principle, and go practice, will It will bring you life, your job, your future a dream.