Machine Learning - Introduction to Neural Networks and Model Representation

Neural network is a very old algorithm. Its original purpose was to create a machine that can simulate the brain. Today, neural networks are an important machine learning technology. It is the basis of deep learning, the most popular research direction at present.

Introduction

In order to build a neural network model, we need to first think about what the neural network in the brain looks like? Each neuron can be thought of as a processing unit/nucleus, which contains many inputs/dendrites ( input / Dendrite ), and has an output/axon ( output / Axon ). A neural network is a network of large numbers of neurons interconnected and communicating through electrical impulses.
Insert image description here

model representation

The neural network model is built on many neurons, and each neuron is a learning model. These neurons (also called activation units ) take some features as input and provide an output based on their own model. The input can be compared to the dendrites of a neuron, the output can be compared to the axon of a neuron, and the calculation can be compared to the cell nucleus.
The figure below is an example of a neuron using a logistic regression model as its own learning model.
Insert image description here

Notice the arrow lines in the middle, these lines are called "connections". There is a "weight" on each connection. When learning logistic regression, we call it a parameter. In a neural network, the parameter can be called a weight (weight ) . Connections are the most important thing in neurons. Each connection has a weight. The training algorithm of a neural network is to adjust the weight value to the optimal value so that the entire network has the best prediction effect.

h θ ( x ) {h_\theta}\left( x \right) in the picturehi(x) s i g m o i d sigmoid s i g m o i d function, that is,1 1 + e − θ T x \frac{1}{1+{ {e}^{-{\theta^{T}}x}}}1+eiTx1

Usually in the input layer we only input nodes x 1 x 2 x 3 x1 x2 x3x 1 x 2 x 3 , an additional node x 0 x0will be added when necessaryx0 x 0 x0 x 0 is called a bias unit or bias neuron. x 0 x0The value of x 0 is always 1.

We designed a neural network similar to neurons, and the effect is as follows: A
Insert image description here
neural network is actually a collection of neurons connected together.
where x 1 x_1x1, x 2 x_2 x2, x 3 x_3 x3are input units into which we feed raw data.
a 1 a_1a1, a 2 a_2 a2, a 3 a_3 a3They are intermediate units that process data and then present it to the next layer.
Finally, there is the output unit, which is responsible for calculating h θ ( x ) {h_\theta}\left( x \right)hi(x)

The neural network model is a network of many logical units organized at different levels. The output variables of each layer are the input variables of the next layer. The picture below shows a 3-layer neural network. The first layer is called the input layer ( Input Layer ), the last layer is called the output layer ( Output Layer ), and the middle layer is called the hidden layer ( Hidden Layers ). We add a bias unit to each layer:
Insert image description here
some notation is introduced below to help describe the model:
ai ( j ) a_{i}^{\left( j \right)}ai(j)Represents jjiiof layer ji activation units. θ ( j ) { {\theta }^{\left( j \right)}}i( j ) represents the starting point fromjjLayer j maps toj+1 j+1j+The matrix of weights at layer 1 , such as θ ( 1 ) { {\theta }^{\left( 1 \right)}}i( 1 ) A matrix representing the weights mapped from the first layer to the second layer. Its size is:j+1 j+1j+The number of activation units in layer 1 is the number of rows, with thejjthThe number of activation units in layer j plus one is a matrix with the number of columns. For example: θ ( 1 ) { {\theta }^{\left( 1 \right)}}in the neural network shown above iThe size of ( 1 ) is 3*4.

For the model shown in the figure above, the activation unit and output are expressed as:

a 1 ( 2 ) = g ( Θ 10 ( 1 ) x 0 + Θ 11 ( 1 ) x 1 + Θ 12 ( 1 ) x 2 + Θ 13 ( 1 ) x 3 ) a_{1}^{(2)} =g(\Theta_{10}^{(1)}{ {x}_{0}}+\Theta_{11}^{(1)}{ {x}_{1}}+\Theta_ {12}^{(1)}{ {x}_{2}}+\Theta_{13}^{(1)}{ {x}_{3}})a1(2)=g ( Th10(1)x0+Th11(1)x1+Th12(1)x2+Th13(1)x3)
a 2 ( 2 ) = g ( Θ 20 ( 1 ) x 0 + Θ 21 ( 1 ) x 1 + Θ 22 ( 1 ) x 2 + Θ 23 ( 1 ) x 3 ) a_{2}^{(2) }=g(\Theta_{20}^{(1)}{ {x}_{0}}+\Theta_{21}^{(1)}{ {x}_{1}}+\Theta _{22}^{(1)}{ {x}_{2}}+\Theta_{23}^{(1)}{ {x}_{3}})a2(2)=g ( Th20(1)x0+Th21(1)x1+Th22(1)x2+Th23(1)x3)
a 3 ( 2 ) = g ( Θ 30 ( 1 ) x 0 + Θ 31 ( 1 ) x 1 + Θ 32 ( 1 ) x 2 + Θ 33 ( 1 ) x 3 ) a_{3}^{(2) }=g(\Theta_{30}^{(1)}{ {x}_{0}}+\Theta_{31}^{(1)}{ {x}_{1}}+\Theta _{32}^{(1)}{ {x}_{2}}+\Theta_{33}^{(1)}{ {x}_{3}})a3(2)=g ( Th30(1)x0+Th31(1)x1+Th32(1)x2+Th33(1)x3)

h Θ ( x ) = a 1 ( 3 ) = g ( Θ 10 ( 2 ) a 0 ( 2 ) + Θ 11 ( 2 ) a 1 ( 2 ) + Θ 12 ( 2 ) a 2 ( 2 ) + Θ 13 ( 2 ) a 3 ( 2 ) ) { {h}_{\Theta }}(x)=a_{1}^{(3)}=g(\Theta _{10}^{(2)}a_{0}^{(2)}+\Theta _{11}^{(2)}a_{1}^{(2)}+\Theta _{12}^{(2)}a_{2}^{(2)}+\Theta _{13}^{(2)}a_{3}^{(2)}) hTh(x)=a1(3)=g ( Th10(2)a0(2)+Th11(2)a1(2)+Th12(2)a2(2)+Th13(2)a3(2))

vectorized computation

Compared with using loops to encode, using vectorization will make the calculation simpler. Taking the above neural network as an example, try to calculate the value of the second layer:
Here, we let
Insert image description here

z 1 ( 2 ) = Θ 10 ( 1 ) x 0 + Θ 11 ( 1 ) x 1 + Θ 12 ( 1 ) x 2 + Θ 13 ( 1 ) x 3 z_{1}^{(2)}=\Theta _{10}^{(1)}{ {x}_{0}}+\Theta _{11}^{(1)}{ {x}_{1}}+\Theta _{12}^{(1)}{ {x}_{2}}+\Theta _{13}^{(1)}{ {x}_{3}} z1(2)=Th10(1)x0+Th11(1)x1+Th12(1)x2+Th13(1)x3
By analogy, z ( 2 ) = Θ ( 2 ) xz^{(2)}=\Theta ^{(2)}xz(2)=Th( 2 )
=(2)}=g(z^{(2)})a(2)=g(z( 2 ) )The calculated output value is:
Insert image description here
adda after calculation 0 (2) = 1 a_{0}^{\left( 2 \right)}=1a0(2)=1

z ( 3 ) = Θ ( 2 ) a ( 2 ) { {z}^{\left( 3 \right)}}={ {\Theta }^{\left( 2 \right)}}{ {a}^{\left( 2 \right)}} z(3)=Th(2)a(2),则 h θ ( x ) = a ( 3 ) = g ( z ( 3 ) ) h_\theta(x)={ {a}^{\left( 3 \right)}}=g({ {z}^{\left( 3 \right)}}) hi(x)=a(3)=g(z( 3 ) ).
Insert image description here
Calculateh θ ( x ) h_\theta(x)hiThe process of ( x ) is also called forward propagation.

In order to better understand how Neuron Networks works, we first cover the left half:
Insert image description here

The right half part is a 0 , a 1 , a 2 , a 3 a_0, a_1, a_2, a_3a0,a1,a2,a3, output h θ ( x ) h_\theta(x) according to the logistic regression methodhi(x)

In fact, the neural network is like logistic regression, except that we take the input vector in logistic regression [x 1 ∼ x 3] \left[ x_1\sim {x_3} \right][x1x3] becomes the middle layer[ a 1 ( 2 ) ∼ a 3 ( 2 ) ] \left[ a_1^{(2)}\sim a_3^{(2)} \right][a1(2)a3(2)], 即: h θ ( x ) = g ( Θ 0 ( 2 ) a 0 ( 2 ) + Θ 1 ( 2 ) a 1 ( 2 ) + Θ 2 ( 2 ) a 2 ( 2 ) + Θ 3 ( 2 ) a 3 ( 2 ) ) h_\theta(x)=g\left( \Theta_0^{\left( 2 \right)}a_0^{\left( 2 \right)}+\Theta_1^{\left( 2 \right)}a_1^{\left( 2 \right)}+\Theta_{2}^{\left( 2 \right)}a_{2}^{\left( 2 \right)}+\Theta_{3}^{\left( 2 \right)}a_{3}^{\left( 2 \right)} \right) hi(x)=g( Th0(2)a0(2)+Th1(2)a1(2)+Th2(2)a2(2)+Th3(2)a3(2))

I can handle a 0 , a 1 , a 2 , a 3 a_0, a_1, a_2, a_3a0,a1,a2,a3Considered as more advanced eigenvalues, that is, x 0 , x 1 , x 2 , x 3 x_0, x_1, x_2, x_3x0,x1,x2,x3Evolutionary bodies of , and they are composed of xxxθ \thetaDetermined by θ , because it is gradient descent, soaaa is changing and becoming more and more powerful, so these more advanced feature values ​​​​are far better than just addingxxThe x power is powerful and can better predict new data. This is the advantage of neural networks over logistic regression and linear regression.

Essentially, a neural network is able to learn its own set of characteristics. In ordinary logistic regression, we are restricted to using the original features x 1 , x 2 , . . . , xn x_1,x_2,...,{ {x}_{n}} in the datax1,x2,...,xn, although we can use some binomial terms to combine these features, we are still limited by these original features. In a neural network, the original features are only the input layer. In our three-layer neural network example above, the prediction made by the third layer, which is the output layer, uses the features of the second layer, not the original features in the input layer. , we can think of the features in the second layer as a series of new features derived by the neural network itself after learning to predict the output variables.

for example

The following is an example to show you how the neural network calculates complex nonlinear hypothesis models.
Start simple. In neural networks, the calculations of single-layer neurons (without intermediate layers) can be used to represent logical operations, such as logical AND ( AND ), logical OR ( OR ), and logical NOT ( NOT ).

1. Logical AND (AND)
The left half of the figure below is the design and output layer expression of the neural network, and the right side is the sigmod function
. We can use such a neural network to represent the AND function:
Insert image description here
first add a bias unit, also known as +1 unit
assigns weights: θ 0 = − 30 , θ 1 = 20 , θ 2 = 20 \theta_0 = -30, \theta_1 = 20, \theta_2 = 20i0=30,i1=20,i2=20Our
output functionh θ ( x ) h_\theta(x)hi(x)即为: h Θ ( x ) = g ( − 30 + 20 x 1 + 20 x 2 ) h_\Theta(x)=g\left( -30+20x_1+20x_2 \right) hTh(x)=g(30+20x1+20x2)
g ( x ) g(x) The graph of g ( x ) is:
Insert image description here
Here is the truth table:
Insert image description here
So we have:h Θ ( x ) ≈ x 1 AND x 2 h_\Theta(x) \approx \text{x}_1 \text{AND} \, \text{x}_2hTh(x)x1ANDx2
This is the AND function.

2. Logical OR (OR)

The neurons in the picture below (the three weights are -10, 20, and 20 respectively) can be regarded as equivalent to logical OR ( OR ):
Insert image description here
OR is the same as AND as a whole, the only difference is that the values ​​of .

3. Logical NOT (NOT)

The neurons in the figure below (the two weights are 10 and -20 respectively) can be regarded as equivalent to logical negation ( NOT ):
Insert image description here
we can use neurons to combine into more complex neural networks to achieve more complex operations. For example, we want to implement the XNOR function (the two input values ​​must be the same, both 1 or both 0), that is, XNOR = ( x 1 AND x 2 ) OR ( ( NOT x 1 ) AND ( NOT x 2 ) ) \text {XNOR}=( \text{x}_1\, \text{AND}\, \text{x}_2 )\, \text{OR} \left( \left( \text{NOT}\, \text{ x}_1 \right) \text{AND} \left( \text{NOT}\, \text{x}_2 \right) \right)XNOR=(x1ANDx2)OR((NOTx1)AND(NOTx2))

First construct an expression that can express ( NOT x 1 ) AND ( NOT x 2 ) \left( \text{NOT}\, \text{x}_1 \right) \text{AND} \left( \text{NOT}\, \text{x}_2 \right)(NOTx1)AND(NOTx2) part of the neurons:
Insert image description here
Then combine the following 3 separate parts into the network:
Insert image description here
Insert image description here
we get a neural network that can implement the function of the XNOR operator.
According to this method, we can gradually construct more and more complex functions and obtain more powerful eigenvalues.
This is the power of neural networks.

The above is the basic knowledge about neural networks. This article is some notes I recorded while studying Andrew Ng’s machine learning. If you have any questions, please feel free to ask!

Guess you like

Origin blog.csdn.net/Luo_LA/article/details/127677001