Andrew Ng machine learning, neural networks introductory notes 7-

7 Neural Networks

Excessive number of features solve, with excessive linear regression algorithm parameters logistic regression case

7.1 MP neuron model

[Image dump outer link failure (img-Nv4NIX4A-1568602817282) (E: \ Artificial Intelligence Markdown \ Machine Learning \ pictures \ 7.1 MP neuron model .png)]

Neurons receive other n input signals transmitting neurons, the weighted sum as the total input value , compared with the neuron threshold value, and through the activation function treatment yields neuron output, the activation function is a sigmoid function, it is set on the neurons function, typically a sigmoid function as the activation function

7.2 Perceptron

[Image dump outer link failure (img-rOyeaqDl-1568602817283) (E: \ Artificial Intelligence Markdown \ Machine Learning \ pictures \ 7.2 two input elements perceptron network .png)]

The most simple neural network, the input layer receives only the external signal, an output layer MP neurons, can be easily realized logic NOR operation perceptron weight learning rule
\ [w_i \ leftarrow w_i + \ Delta w_i \ tag {7.1} \ ]

\ [\ Delta w_i = \ eta (gamma \ hat {y}) x_i \ tag {7.2} \]

Where \ (\ eta \ in (0,1 ) \) is called learning rate, usually a small positive number, for example 0.1

  • Only one functional neuronal learning ability is very limited and can only solve linearly separable problems, not solve the nonlinear separable problem, multi-functional neurons to solve nonlinear separable problem

[Image dump outer link failure (img-dC3hknfL-1568602817284) (E: \ Artificial Intelligence Markdown \ Machine Learning \ pictures \ multi-layer feedforward network before 7.3 .png)]

The picture shows the front upper multilayer feedforward neural layers between all the neural network output layer to the input layer neurons are called hidden layer , with no connection layer neurons, cross-layer connection does not exist, the next layer fully connected with each

[Picture outside the chain dump failure (img-vKqzuXUs-1568602817293) (E: \ Artificial Intelligence Markdown \ Machine Learning \ pictures \ 7.2 principles of neural network algorithm .png)]

The input layer is only responsible for receiving input from the outside, hidden layer and output layer neurons neuronal function, the signal processing, the final result output from the output layer neurons

  • As long as the presence of the hidden layer, multi-tier network can be called
  • Neural network learning is adjusted according to the output connection weight between the neurons and a threshold value of each neuron function
  • Forward propagation Meaning: the input key activation as a first layer , followed by calculation of the hidden layer activation term and the last term is calculated corresponding to the output layer output activation
  • 若输入为布尔型,要实现逻辑运算给哪个输入乘参数为负的值,即期望哪个输入为非,调整各个参数达成不同线性组合的值来实现各种基于sigmoid激活函数的逻辑运算
  • ==神经网络没有对样本特征进行学习,而是对样本特征与第一层参数计算出的激活项进行学习,比直接对原始输入学习能得出更好的模型参数==
  • ==实际上神经网络是将原本的特征首先转化成更为复杂的复合特征,对该复合特征进行学习,得出基于复合特征的最优参数,同时更新复合特征中的原本特征的参数项,进而确定最优的特征复合项,即不断将原本特征逐层复杂化,再将隐含层所做的所有构建复杂特征的工作结果传到最后的输出层,由输出层利用这些复杂特征进行逻辑回归计算后,得出最后的值==

7.2.1 神经网络向量化计算

[Image dump outer link failure (img-MPYirHrR-1568602817294) (E: \ Artificial Intelligence Markdown \ Machine Learning \ pictures \ 7.2.1 neural network to the calculated quantization .png)]

7.2.2 理解神经网络结构的经典例子

[Image dump outer link failure (img-tGc6sWeh-1568602817295) (E: \ Artificial Intelligence Markdown \ Machine Learning \ pictures \ 7.2.2 appreciated that classic examples of neural network structure .png)]

把相对简单的函数放于第二层,层数越高计算应越复杂,最后只有一种确定的放置方法,体现在参数经学习后会得出一最优值

7.2.3 多元分类

建立多个输出单元分别对应不同类别,一个为1时其余为0,实现多元分类,同时标签也由单个值变为多维向量,维数与类别个数相同

7.3 神经网络反向传播算法(BP)

7.3.1 准备用于神经网络的代价函数

[外链图片转存失败(img-k0n5aCtl-1568602817296)(E:\Artificial Intelligence Markdown\Machine Learning\pictures\7.3.1 准备用于神经网络的代价函数.png)]

  • 前向传播用于计算各个参数值\(\theta\)

[外链图片转存失败(img-BRgLF4Dm-1568602817296)(E:\Artificial Intelligence Markdown\Machine Learning\pictures\7.3.1 前向传播.png)]

  • 反向传播用于计算代价函数对于各个参数的偏导,把输出层的误差反向传播给第3层,再传给第2层

[外链图片转存失败(img-3dgYenFI-1568602817298)(E:\Artificial Intelligence Markdown\Machine Learning\pictures\7.3.2 反向传播算法种各个单元误差.png)]

7.3.2 反向传播算法

  1. 训练集为{\((x^{(1)},y^{(1)}),...,(x^{(m)},y^{(m)})\)},设定\(\Delta_{i,j}^{(l)}=0\)(对于所有i,j,l),其中i代表第i个样本,j代表各层单元的序号,l代表层号

  2. for i=1 to m遍历整个样本

    \(a^{(1)}=x^{(i)}\)

    用前向传播计算每层各个单元的a值

  3. \(y^{(i)}\)计算输出层误差\(\delta^{(L)}=a^{(L)}-y^{(i)}\)

    再用反向传播计算每层的误差,\(\delta^{(L-1)},\delta^{(L-2,...,\delta^{(2)})}\),因为输入层为样本值,故无需计算误差\(\delta^{(1)}\)

  4. 各层的误差可表示为:\(\Delta_{i,j}^{(l)}:=\Delta_{i,j}^{(l)}+a_j^{(l)}\delta_i^{(l+1)}\)

    行代表一个样本在某层所有单元的误差

  5. 遍历完所有样本后:
    \[ \begin{array}{l}{D_{i j}^{(l)} :=\frac{1}{m} \triangle_{i j}^{(l)}+\lambda \Theta_{i j}^{(l)}\quad\text { if } j \neq 0} \\ {D_{i j}^{(l)} :=\frac{1}{m} \triangle_{i j}^{(l)} \quad \quad\quad\quad\text { if } j=0}对应额外偏差项\end{array}\tag{7.3} \]

其中\[ \frac{\partial}{\partial \Theta_{i j}^{(l)}} J(\Theta)=D_{i j}^{(l)} \]

7.3.3 神经网络理解

  • 输入层各个单元代表输入单个样本的各个特征量
  • 各个箭头汇聚的隐含层和输出层单元表示计算\(z=\theta a\),\(\theta\)为每条线代表的权重,a代表前一层各个单元的输出。每个单元内还需求z的sigmoid值作为假设函数,输出层即代表输出
  • 若无K下标,则代价函数只代表输出单元为一个的情况
  • [外链图片转存失败(img-JRGm5qaY-1568602817300)(E:\Artificial Intelligence Markdown\Machine Learning\pictures\7.3.3 神经网络理解前向传播算法.png)]
  • [外链图片转存失败(img-XGqrnb1A-1568602817301)(E:\Artificial Intelligence Markdown\Machine Learning\pictures\7.3.3 神经网络理解反向传播算法.png)]

7.3.4 梯度检验

BP算法和梯度下降等算法一起工作可能会产生BUG导致虽然每次代价函数减小,但最后得到的结果误差会比无BUG情况下高出一个数量级。未避免该情况发生,需要进行 梯度检验

[外链图片转存失败(img-9Zsh8DMF-1568602817304)(E:\Artificial Intelligence Markdown\Machine Learning\pictures\7.3.4 梯度检验.png)]

其中,EPSILON=10e-4,当这两个梯度近似相等时(只有几位数的差距),可认为DVec可用,当验证通过时,关闭梯度检验

7.3.5 随机初始化

若参数均初始化为0,则神经网络只可计算出一个特征,输入层每个单元引出的参数相等。

[外链图片转存失败(img-lnG8N3Dv-1568602817306)(E:\Artificial Intelligence Markdown\Machine Learning\pictures\7.3.5 参数随机初始化.png)]

  • 参数随机初始化的范围可由下式确定。
    \[ \text{epsilon\_init}=\frac{\sqrt{6}}{\sqrt{输入层单元个数+输出层单元个数}}\tag{7.4} \]

7.3.6 总结

  1. 选取神经网络结构

    输入层单元个数由样本的特征个数决定;输出层单元个数由所有区分的类别个数确定;一般选择单隐层,若选多个隐层,各层单元个数应相同,且总单元个数为输入层的几倍。通常层数越多越好,但计算量会加大。可用验证集对不同隐含层数的结构进行代价函数计算,选取最小代价的隐含层层数结构

  2. 训练神经网络

    2.1 随机初始化权重为极小值

    2.2 执行前向传播算法:\(x^{(i)}\rightarrow h_\theta(x^{(i)})\)

    2.3 计算代价函数\(J(\theta)\)

    2.4 执行反向传播算法,求出代价函数关于各个参数偏导:\(\frac{\partial}{\partial \Theta_{j k}^{(l)}} J(\Theta)\),即找出梯度下降方向

    可用for循环计算每个样本也可向量化一次计算,开始建议用for循环,循环内包含了前向传播和反向传播

    [外链图片转存失败(img-C238KFDi-1568602817307)(E:\Artificial Intelligence Markdown\Machine Learning\pictures\7.3.6 总结2训练神经网络.png)]

    2.5 梯度检验,若正确则停止检验

    2.6 使用优化方法,如梯度下降法,LBFGS,共轭梯度法,其他内置到fminunc函数中的方法,结合反向传播计算的偏 导,令代价函数逐步下降到最小

    ​ 有可能收敛到局部最小值,但效果不错

7.4 其他常见神经网络

7.4.1 径向基函数网络(RBF)

单隐层前馈神经网络,与多层前馈神经网络隐含层使用sigmoid函数作为激活函数不同,RBF隐含层使用径向基函数作为隐层神经元激活函数,输出层是对隐层神经元输出的线性组合

7.4.2 自适应谐振理论网络(ART)

依靠竞争性学习的神经网络,网络的输出神经元相互竞争,每时刻仅有一个竞争获胜的神经元被激活,其余神经元状态被抑制

7.4.3 自组织映射网络(SOM)

竞争学习型无监督神经网络,将高维输入数据映射到低维空间,同时保持输入数据在高维空间的拓扑结构,即将高维空间中相似的样本点映射到网络输出层中的邻近神经元

[外链图片转存失败(img-hs3cGwL2-1568602817308)(E:\Artificial Intelligence Markdown\Machine Learning\pictures\7.4.3 自组织映射网络.png)]

7.4.4 级联相关网络(CC)

典型的结构自适应网络,希望在训练过程中找到最符合数据特点的网络结构,一开始网络只有输入层和输出层,随着训练进行,新的隐层神经元逐渐加入,从而创建层级结构形成级联

最大化新神经元的输出与网络误差之间的相关性来训练相关的参数

[外链图片转存失败(img-Kq8vYODY-1568602817308)(E:\Artificial Intelligence Markdown\Machine Learning\pictures\7.4.4 级联相关网络.png)]

7.4.5 递归神经网络(Elman)

允许网络中出现环形结构,隐层神经元的输出反馈回来作为输入信号,隐层神经元采用sigmoid激活函数,训练采用推广的BP算法

[外链图片转存失败(img-dJVhGx2I-1568602817309)(E:\Artificial Intelligence Markdown\Machine Learning\pictures\7.4.5 递归神经网络.png)]

7.4.6 Boltzmann机

基于能量的模型,神经元都是布尔型,能量最小化时网络达到理想状态

[外链图片转存失败(img-EY5mvSkv-1568602817309)(E:\Artificial Intelligence Markdown\Machine Learning\pictures\7.4.6 Boltzmann机.png)]

7.5 神经网络结构与误差

Few simple neural network parameters, prone to poor fitting, large deviations, simple calculation

Complex neural network parameters and more prone to overfitting, variance is large, but can be corrected regularization prevent overfitting, combine good performance than a simple neural network m

Guess you like

Origin www.cnblogs.com/jestland/p/11548519.html