Machine Learning (MACHINE LEARNING) [Zhou Zhihua version - "Watermelon book" - notes] DAY5- neural network

Here Insert Picture Description
5.1 Neuron Model

神经网络是由具有适应性的简单单元组成的广泛并联互连的网络,它的组织能够模拟生物神经系统对真实世界物体所作出的交互反应.
神经网络中最基本的成分是神经元模型.在生物神经网络中,每个神经元与其它神经元相连,当它兴奋时,就会向相连的神经元发送化学物质.

MP neuron model
Here Insert Picture Description
MP neuron model
at the neuron model MP, the neuronal receiving an input signal from the n other neurons passed over, the input signals for transmission, neuronal received through with weights connected to a total input to neurons within the threshold comparison, and then processed to produce output neurons by activating function (function activation).
the Sigmoid activation function
used is a sigmoid activation function, typical sigmoid function as shown in FIG.
Here Insert Picture Description
5.2 and a multilayer perceptron network
after passing perceptron neurons from two layers, as shown, receives the external input signal input layer to the output layer as shown below, is the output layer neurons MP, also known as "threshold logic unit."
Here Insert Picture Description
Here Insert Picture Description
It should be noted that only the output perceptron layer neuron activation function processing, that has only one functional nerve cells, their learning ability is very limited.
to solve the phenomenon of non-separable problem, consider using a multilayer neuronal function. in general, the neural network is a common figure below hierarchy shown, each neuron and the lower layer neuron whole network, no connection exists between the different layers of neurons, there is no inter-layer connections. such nerve Commonly referred to as a feedforward neural network the network (multi-layer feedforward neural networks) Multilayer.

Here Insert Picture Description
5.3 error propagation algorithm
learning ability is much stronger than single multi-layer network-aware machine. To train a multi-layer network, you need a more powerful learning algorithm error back propagation (error BackPropagation, referred to as BP) algorithm is one of the most outstanding representatives. when the real task using neural networks, mostly using BP algorithm for training. normally, said, "BP network," generally refers to a multi-layer feedforward neural network algorithm trained with BP.
Here Insert Picture Description
Here Insert Picture Description
Here Insert Picture Description
BP algorithm workflow
for each training example , BP algorithm performs the following actions.

先将输入示例提供给输入层神经元,然后逐层将信号前传,直到产生输出层的结果
然后计算输出层的误差,再将误差逆向传播至隐层神经单元
最后根据隐层神经元的误差来对连接权和阈值进行调整
该迭代过程循环进行,直到达到某些停止条件为止
Here Insert Picture Description

(0,1)范围内随机初始化网络中所有连接权和阈值
repeat:
	for all(x_k,y_k) in D do
		根据当前参数和公式0计算当前样本的输出y^k;
		计算输出层神经元的梯度g_j;
		计算隐层神经元的梯度e_h;
		更新连接权w_hj,v_ih及阈值theta_j,gamma_h
	end for
until 达到停止条件(训练误差达到设定值或训练次数达到设定值)

Here Insert Picture Description
5.4 全局最小和局部最小
如果误差函数具有多个局部极小,则不能保证找到的解是全局最小,我们称参数寻优陷入了局部极小.
在现实任务中,人们常采用以下策略来试图跳出局部极小,从而进一步接近全局最小.

Different sets of parameter values to initialize the plurality of neural networks, according to standard procedures after training, which takes the minimum error solution as the final argument. This corresponds with a plurality of different initial points to start the search, so that it may fall into different local minima , it is possible to choose to get closer to the global minimum of the results
using the "simulated annealing" technique. simulated annealing are accepted with a certain probability is worse than the current solution results at each step, thereby facilitating escape from local minima. in each iteration process, the probability of receiving suboptimal solution over time to be reduced gradually in order to ensure algorithm stability.
using stochastic gradient descent. standard gradient descent method to accurately calculate different gradients, stochastic gradient descent method in the calculation of the gradient was added the random factors. Thus, even if the local minimum point, it is still possible to calculate the gradient is not zero, so there is the opportunity to escape from local minima continue the search.
5.5 other common neural networks
have RAF, ART, SOM, cascade neural networks, Elman network, not one gave a detailed account of Kazakhstan
5.6 depth learning
theory, the higher the complexity of multi-parameter model, the "capacity" (capa city) greater, which means that he can finish more complex learning tasks, but under normal circumstances, low efficiency training complex models, easy to fall into over-fitting, it is difficult favored by people. With cloud computing, the arrival of the era of big data, computing power can relieve Tai Fook improve training inefficiency, a substantial increase in the training data can reduce the risk of over-fitting, therefore, to deep learning (DL), represented by complex model began to receive attention.
A typical model is very deep depth study of the neural network, obviously, the neural network model, a simple way to increase capacity is to increase the number of hidden layers.


This update to Kazakhstan today, feels like journals ha ha, is the quality is not high! !

Published 545 original articles · won praise 129 · views 40000 +

Guess you like

Origin blog.csdn.net/weixin_43838785/article/details/104238818