The geometric principle of artificial neural network Ⅰ: single (hidden) layer neural network

The geometric principle of artificial neural network

(Geometric principle of Artificial Neural Networks)

The artificial neural network discussed in this paper is only an ordinary neural network (not CNN and RNN) composed of the simplest ReLU neurons, and only a classic scenario of single (hidden) layer classification is discussed.

basic convention

In order to facilitate discussion and visualization, the activation functions used in the whole text are all ReLU, and the original input X is a two-dimensional vector.

Example 1

The following figure shows the simplest artificial neural network, which contains an input layer of two nodes, an output layer of two nodes, and a hidden layer of three nodes. The network can be used to solve binary classification problems where the input is a two-dimensional vector, and the output is the probability of the two classes.

Simplest Artificial Neural Network

  • Input layer - 2-dimensional vector X
  • Hidden layer (first layer) - ReLU layer (3 neurons)
  • Output layer (second layer) - Softmax layer (2 neurons, binary classification)

The figure below shows the assumed sample distribution, each sample has two features (its abscissa value X0, its ordinate value X1) and belongs to one of the two categories of red and green (its color). The true dividing line of the sample is a circle.

sample distribution

The figure below shows the optimal result after learning (omitting the learning process) in the case of the above network. The neural network considers that the samples in the gray area are red, and the samples outside the gray area are green. The accuracy rate is 95%. (You can click TensorPlayground to experience the learning process)

Results of Neural Network Classifiers

  • The optimal boundary graph under this distribution with 3 ReLU neurons is a hexagon

Why can this effect be achieved under such a simple neural network (hexagonal demarcation figure)? The following article will explain its internal principles from a geometric perspective, so that the artificial neural network is no longer a black box.

A single ReLU neuron

ReLU neuron

  • W, X are both vectors
  • X is the input of the neuron, W is the weight parameter of the neuron, and b is the bias Bias parameter of the neuron

Here, let W and X be 2-dimensional vectors, and let W = [1.5, 3.5], b = -2.5, the image is as follows:

ReLU neuron function image

  • (Note that the scale of the Z axis in the above figure is inconsistent with the X0 and X1 axes)

A single ReLU neuron, whose input is X in the n-dimensional space, generates a hyperplane in the high-dimensional space of n+1 (let the newly added dimension be Z), and then folds along the hyperplane of Z=0, superfolded surface

The angle of the superfolded surface

Determined by the W parameter

The angle of the superfolded surface

(high dimensional space)

The angle of the hyperfolded surface - high-dimensional space

  • always obtuse

The position of the hyperpolyline of the hyperfolded surface on the Z=0 hyperplane

Determined by the W parameter and the b parameter

The position of the hyperpolyline of the hyperfolded surface on the Z=0 hyperplane

(high dimensional space)

The position of the hyperpolyline of the hyperfolded surface on the Z=0 hyperplane - high-dimensional space

Constant* Hyperfolded

C * Z

Stretching, shrinking, and flipping in the Z-axis direction will change the angle of the folded surface, but will not change the position of the folded line

  • 1 < C ➡️ stretch, the fold angle becomes smaller (steepened)
  • 0 < C < 1 ➡️ Shrink, the fold angle becomes larger (flattened)
  • C < 0 ➡️ Flip, fold down

C=2

Constant * superfolded surface C=2

C=0.6

Constant * superfolded surface C=0.6

C=-1

Constant * Hyperfolded surface C=-1

C=-2

Constant * Hyperfolded surface C=-2

C=-0.6

Constant * superfolded surface C=-0.6

super folded surface + super folded surface

Z0 + Z1

Fold the first fold again according to the fold line of the second fold, and make the angle of the original two folds smaller (steeper)

It will not change the position of the hyperpolyline of the hyperfolded surface on the Z=0 hyperplane, but the part of the polyline is folded away from the original Z=0 hyperplane

Super Folded Z0

Super Folded Z1

=

Super folded surface Z0+Z1

The first layer of ReLU neurons

Linear addition of multiple hyperfolded surfaces (later layer perspective)

Linear addition of multiple hyperfolded surfaces

  • Hn is the result of the operation of the nth ReLU neuron in the first layer

n neurons ➡️ generate n hyperpolylines on the hyperplane of Z=0, fold in high dimensional space

  • The position of the hyperpolyline is determined only by a single neuron in the first layer, regardless of the parameters of the subsequent layer
  • The W parameter of the latter layer determines the relative folding angle of each hyperfolded surface
  • The b parameter of the latter layer determines the position of the entire composite hyperfolded surface on the Z axis (moving up and down)

line dividing plane

n lines divide the plane into at line dividing planemost parts

n hyperplanes divide the d-dimensional space

n hyperplanes divide the d-dimensional space into at most f(d, n) parts

n hyperplanes divide the d-dimensional space

Softmax under binary classification

softmax(X) converts the value of the vector X into the probability (0~1) of independent events at each index position

softmax(X)

For the binary classification of Softmax, in fact, the network results of the previous layer are linearly added to two groups, and the group with a larger result value is used as the prediction result.

Binary Classification with Softmax

Make a transformation here, use R1 - R0 and 0 to compare the size, replace the two to directly compare the size, and finally the Softmax layer is simplified to a set of linear additions and 0 to judge the size

R1 - R0

  • Z < 0, predict 0 class
  • Z > 0, predict 1 class

Softmax under multivariate classification

For the multivariate classification of Softmax, the network results of the previous layer are actually added linearly in multiple groups, and the group with the largest result value is used as the prediction result.

Multivariate classification with Softmax

Do a transformation here, use Ra - Rb and 0 to compare the size, and replace the two to directly compare the size. Finally, the Softmax layer is equivalent to using a set of linear additions and 0 to determine the size to determine which of the a and b categories is more in line with One class, multiple judgments can find the maximum likelihood classification

  • It is still possible to use the geometric perspective of the projection of a linear combination on the hyperplane of Z=0 under the binary classification

1 ReLU layer + 1 Softmax layer binary classification network

  • The input is X in n-dimensional space
  • Generate m hyperpolylines on the hyperplane of Z=0 in the n+1 dimensional space, and perform m folding (m is the number of ReLU neurons in the first layer)
  • The folded graph is linearly added and combined (the angle of each fold and the position of the whole on the Z axis are changed), and the size is compared with the hyperplane of Z=0
  • In the n+1-dimensional space, the projection of its graph on the hyperplane of Z=0 is the binary classification boundary in the original n-dimensional space

It can be seen that for any finite distribution that conforms to a certain law, as long as a sufficient number of ReLU neurons are provided, a high-dimensional space graph that matches this distribution can be generated by the folding of the high-dimensional space.

Analysis of Example 1

High-dimensional space perspective

  • Graphics generated by 3 hyperpolylines in high-dimensional space
  • 3 straight lines divide the plane to produce up to 7 parts, here 6 parts (the smallest part in the middle is ignored)
  • The boundary graph of the projection under the hyperplane of Z=0 is exactly a hexagon

Hyperpolyline of the first layer ReLU on the Z=0 hyperplane

Hyperpolyline of the first layer ReLU on the Z=0 hyperplane

Z in high dimensional space

Z in high dimensional space

Z=0 projection in high-dimensional space

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324536948&siteId=291194637