Neural Networks (neural networks)

Foreword

      This chapter is followed by the contents of the last chapter, in front, to introduce the Linear Regression (linear regression) problems and Logistic Regression (logistic regression) problem. In this chapter we will introduce another machine learning method Neural Networks (neural networks), Now that you have these two machine learning methods, why do we need this method of Neural Networks? Here we will discuss in detail.

      Finally, if some had misunderstood, I hope you have educated us, thank you!

Chapter VI Neural Networks (neural networks)

6.1 Non-linear hypotheses (nonlinear hypothetical question)

      Here with a non-linear logistic regression problems for everyone to explain why we want to introduce a new way of learning Neural Networks everyone. Shown in Figure 1, which is a non-linear classification problem, in front also tell you, we need shows a boundary line decision to classify, we can set up an expression of this line is: h(x)=g(\theta _{0}+\theta _{1}x_{1}+\theta _{2}x_{2}+\theta _{3}x_{1}x_{2}+\theta _{4}x_{1}^{2}x_{2}+\theta _{5}x_{1}^{3}x_{2}+\theta _{6}x_{1}x_{2}^{2}+...)by seen in front knowledge, we can here piece fitted line shown in FIG. 1, to separate the two positive and negative data. This is only considered two features, but in practical problems, such as house prices, with house size, number of rooms, floors, etc. have a relationship, and our expression not only have their own squared term there multiplying the two items, and so on, which led to our very large amount of calculation will be complicated, but also in front of the discussion, when more features more easily over-fitting, so this time to introduce the front the method is inappropriate.

                                                                                 图1 Non-linear Classification

    Another example is again another question, give you a picture, judge the car, we the human eye is very easy to see at a glance the car, but the machine will see what it is? as shown in picture 2.

                                                                              The driver of FIG 2 pixel table

     For example, a car handle, the machine's "eyes" is a pixel table, a table has a large number of pixel data, if we want to find out which of four pictures a few car is shown in Figure 3.

                                                                  FIG. 3 car and non-car images are classified

      In Figure 3, we can see that there are two pictures car, then this simplified diagram, shown in Figure 4, this time the problem seems to be into what we talked about earlier classification.

                                                                          FIG. 4 FIG trolley simplified classification

   For a picture of the car, we certainly can not handle just one pixel on a judgment that this is a car, here we are just two of the pixels, such as the position shown in Figure 5.

                                                                 5 Select the two pixel trolley

     This time might be such a classification problem shown in Figure 6, which became a non-linear logistic regression problem, and features in this issue, we have to consider is the pixel for pixel, it is a small part of a matrix data, assuming a 50 * 50 pixels we selected above, so there are a total of 2500 pixels, so that we have an input x component 2500, and each pixel value is 0 to 255 may, if we only consider two key features that is multiplied x_{i}\times x_{j}, then there are about three million features, it can be seen that there is much computation, and the more features for a long time when, more prone to over-fitting, once again shows that we discussed earlier non-linear logistic regression algorithm does not apply to them.

                                                                          图6 小车分类问题

 6.2  Neurons and the brain(神经元和大脑)

     在前面讨论的问题中,我们很明显地感受到了在前面章节讨论的算法不行,在这里我们就要像大家引进Neural Networks,顾名思义我们希望机器能像大脑一样自动进行计算,在我们的大脑皮层上有听觉皮层,视觉皮层,触觉皮层等等,如果我们想让机器拥有和我们一样的感官,我们是不是要写不同的算法分别来实现不同的功能了?答案肯定是NO,这样问题还是会复杂化,在实验中我们发现,如果断掉耳朵和听觉皮层间的神经,我们让眼睛和听觉皮层相连,这样我们就让听觉皮层学会了看,是不是很不可思议,下面还有一些不可思议的实例,让大家对神经网络这个学习方法有更大的兴趣,如图7所示,比如让舌头学会看,或者更神奇的,我们给青蛙第三只眼睛,那只眼睛也会学会看等等。

                                                                               图7 大脑中的感官器

    有了前面的铺垫,既然我们要做的就是模拟大脑,那么我们首先得先了解大脑神经元的构造,如图8所示。

                                                                                     图8 大脑中的神经元

     对照图8,我们来给大家介绍几个名词,对于Dendrite(树突)我们称为输入线,对于Axon(轴突)我们称为输出线,那么我们是怎样感受到外界并做出相应的反应的了?其实感知并做出相应的反应不过是树突对我们接收到的信息给我们的神经元核心进行分析然后给轴突进行输出到相应的器官上做出反应。所以这个问题的简化就是有输入,处理中心,输出,我们对此也给出了Neural Networks的模型,如图9所示。

                                                                             图9 神经元模型:逻辑单元

    在图9这个模型中,我们就有x1、x2、x3这样的输入,最后h(x)即是我们的输出,而中间就是我们的处理单元,在这个问题中我们有时会给输入多加一个x0,我们称为bias unit(偏置单元),我们通常设置为1,这样输入就是x=\begin{bmatrix} x0\\x1 \\ x2 \\ x3 \end{bmatrix}\theta =\begin{bmatrix} \theta 0\\ \theta 1 \\ \theta 2 \\ \theta 3 \end{bmatrix}在这里我们把\theta称为weights(权重),这里的h(x)=\frac{1}{1+e^{-\theta ^{\top }x}},还是我们前面给大家介绍的逻辑函数。

     前面给大家介绍的还只是模型,下面给大家介绍一个具体的神经网络结构。如图10所示

                                                                                        图10 神经网络

    在图10中,如果我们把第二层遮住,这样我们就会得到和前面一样的模型,在这里我们同样把第一层称为我们的输入,第三层称为我们的输出,只是这里比较特殊的是多了一个第二层,在这里我们称第二层为掩藏层,同样我们在除了输出层外的其他层加一个偏置单元。在这个问题中xi依然是输入,h_{\Theta }(x)为输出,而这里的a_{i}^{(j)}则是第j层第i个单元的activation,下面我们来表示出a_{i}^{(j)}

a_{1}^{(2)}=g(\Theta _{10}^{(1)}x_{0}+\Theta _{11}^{(1)}x_{1}+\Theta _{12}^{(1)}x_{2}+\Theta _{13}^{(1)}x_{3})

a_{2}^{(2)}=g(\Theta _{20}^{(1)}x_{0}+\Theta _{21}^{(1)}x_{1}+\Theta _{22}^{(1)}x_{2}+\Theta _{23}^{(1)}x_{3})

a_{3}^{(2)}=g(\Theta _{30}^{(1)}x_{0}+\Theta _{31}^{(1)}x_{1}+\Theta _{32}^{(1)}x_{2}+\Theta _{33}^{(1)}x_{3})

而我们的h_{\Theta }(x)=a_{1}^{(3)}=g(\Theta _{10}^{(2)}a_{0}^{(2)}+\Theta _{11}^{(2)}a_{1}^{(2)}+\Theta _{12}^{(2)}a_{2}^{(2)}+\Theta _{13}^{(2)}a_{3}^{(2)})

在这里我们有\Theta ^{(1)}=\begin{bmatrix} \Theta _{10}^{(1)} & \Theta _{11}^{(1)} & \Theta _{12}^{(1)} &\Theta _{13}^{(1)} \\ \Theta _{20}^{(1)} &\Theta _{21}^{(1)} & \Theta _{22}^{(1)} & \Theta _{23}^{(1)}\\ \Theta _{30}^{(1)}&\Theta _{31}^{(1)} & \Theta _{32}^{(1)} & \Theta _{33}^{(1)} \end{bmatrix},称为权值控制函数矩阵的映射,大小是第j+1层的层数*(第j层的层数+1),如果我们把第j层的层数用s_{j}来表示,则\Theta ^{(j)}的大小为s_{j+1}\times (s_{j}+1),如果我们对a_{1}^{(2)}表达式中的输入参数\Theta _{10}^{(1)}x_{0}+\Theta _{11}^{(1)}x_{1}+\Theta _{12}^{(1)}x_{2}+\Theta _{13}^{(1)}x_{3}z_{1}^{(2)}来代替,这样a_{1}^{(2)}=g(z_{1}^{(2)}),同样地,后面也可以这样表示,这样表达式就成了x=\begin{bmatrix} x0\\x1 \\ x2 \\ x3 \end{bmatrix}我们对x用a^{(1)}表示,z^{(2)}=\begin{bmatrix} z_{1}^{(2)}\\ z_{2}^{(2)} \\ z_{3}^{(2)} \end{bmatrix}z^{(2)}=\Theta ^{(1)}a^{(1)}a^{(2)}=g(z^{(2)}),我们加上a_{0}^{(2)}=1,则z^{(3)}=\Theta ^{(2)}a^{(2)}h_{\Theta }(x)=a^{(3)}=g(z^{(3)}),这个时候表达式就比较简单了。

       在这个Neural Networks中为什么计算就比较简单了?当我们把第二层遮住时,即给输入就可以得到输出了,相当于中间的计算是它自己完成的,就是一个自我学习的过程。

       对于其他的神经网络构造,如图11所示,除了输入层第一层和输出层最后一层外,其他的我们都称为掩藏层,掩藏层越复杂也就表示我们可以进行更复杂问题的学习。

                                                                                     图11 其他神经网络构造

6.3 例子和直观的解释

     在上面的分析中可能比较抽象,下面我将用几个例子来给大家详细分析下整个神经网络是如何工作的。

example one:非线性的分类问题(XOR异或、XNOR异或非)

      如图12所示,关于两个输入x1、x2(他们的取值都只有0和1),这是一个异或非的问题,对于这个问题,很明显是一个非线性的,那么该如何构造神经网络来对此进行解决了?在下面我先给大家介绍几个简单的例子来先说明神经网络工作的过程,到后面再来给大家解决这个问题。

 

                                                                                          图12 非线性分类问题

1)AND

在这里x1,x2\in{0,1},y=x1 AND x2,在这里我们构造出神经网络模型如图13所示,我们还相应地给出了每个输入对应的权重,分别是-30、20、20,则h_{\Theta }(x)=g(-30+20x_{1}+20x_{2}),而这个g函数,我们在前面时已经给大家做过介绍,在这里我们同样用它,如图14所示,这就是g(z)的图像,而当z=4.6时,g(z)\approx0.99接近于1,当z=-4.6时,g(z)\approx0.01接近于0,所以对于这个表达式,我们可以列出真值表,如下所示:

                                                                          图13 AND的神经网络结构

x1     x2       h_{\Theta }(x)

0       0        g(-30)\approx0

0       1        g(-10)\approx0

1       0        g(-10)\approx0

1       1        g(10)\approx1

我们根据这个真值表可以看出,只有当x1和x2都为1时,最后结果才为1,这不就是AND的功能吗?即h_{\Theta }(x)\approxx1 AND x2

2)OR

     在这里我们要实现或的功能,我们同样给出神经网络的模型如图14所示。只是这里的权重有所改变,为-10、20、20,则h_{\Theta }(x)=g(-10+20x_{1}+20x_{2}),同样我们给大家列出真值表如下:

                                                                              图14 OR的神经网络模型

x1     x2       h_{\Theta }(x)

0       0        g(-10)\approx0

0       1        g(10)\approx1

1       0        g(10)\approx1

1       1        g(30)\approx1

很明显这是一个OR运算,即h_{\Theta }(x)=x1 OR x2。

3)NOT

     有了AND和OR,再给大家介绍一个NOT,取反运算。模型如图15所示,在这里输入只有一个,再加上一个偏置单元,我们给的权重是10、-20,则h_{\Theta }(x)=g(10-20x_{1}),同样当x1=0时,h_{\Theta }(x)=g(10)\approx1;当x1=1时,h_{\Theta }(x)=g(-10)\approx0,既是x1取反的结果,所以h_{\Theta }(x)=NOT x1。

                                                                                 图15 NOT的神经网络模型

         好了,前面已经给大家介绍了AND、OR、NOT,下面就来解决前面提出的那个问题如何构造出XNOR,在这里,大家首先要明白x1 XNOR x2=(\overline{x1}AND\overline{x2})OR(x1ANDx2),下面我们就有x1 AND x2,\overline{x1}AND\overline{x2},x1 OR x2,如图16所示。

                                                                                   图16 三个神经网络模型

所以现在我们就可以用以上三种来构造出更复杂的神经网络XNOR了,如图17所示。

                                                                                 图17 XNOR模型

同样地,我们列出真值表如下:

x1     x2      a_{1}^{(2)}      a_{2}^{(2)}          h_{\Theta }(x)

0       0         0         1               1                        a_{1}^{(2)}=x1 AND x2 

0       1         0         0               0                        a_{2}^{(2)}=(NOT x1)AND(NOT x2)

1       0         0         0               0                         h_{\Theta }(x)=x1 XNOR x2

1       1         1         0               1

6.4 Multi-class classification(多类分类问题)

      在我们的实际生活中,很多分类的问题不止只有两类,比如前面给大家介绍的怎样在几张照片中找出小车,那如果现在我们把问题更加复杂化,我们需要对最后的结果进行分类,哪些是小车,哪些是行人,哪些是摩托车等等,这个时候输出的结果就不能只用单一的0或1表示了,如图18所示的分类问题。

                                                                              图18 one-vs-all

     This time, probably this neural network, as shown in FIG. 19, a result output on more than this time is not only a result to indicate a component.

                                                                    Figure 19 one-vs-all neural network model

Here we use h_{\Theta }(x)\approx \begin{bmatrix} 1\\ 0 \\ 0 \\ 0 \end{bmatrix}to indicate the result of pedestrians, h_{\Theta }(x)\approx \begin{bmatrix} 0\\ 1 \\ 0 \\ 0 \end{bmatrix}to represent the result is a car, and so on. This time our training set or (x^{(1)},y^{(1)}), , (x^{(2)},y^{(2)})..., (x^{(m)},y^{(m)}), but y not belonging to the 1,2,3,4, but we are here \begin{bmatrix} 1\\0 \\ 0 \\ 0 \end{bmatrix}, , \begin{bmatrix} 0\\1 \\ 0 \\ 0 \end{bmatrix}, \begin{bmatrix} 0\\0 \\ 1 \\ 0 \end{bmatrix}, \begin{bmatrix} 0\\0 \\ 0 \\ 1 \end{bmatrix}they represent the four things above.

Guess you like

Origin blog.csdn.net/qq_36417014/article/details/83865636