Deep learning - neural networks

Neural Networks

Deep learning (deep learning) is a branch under the Machine Learning

It attempts to use comprising a complex structure or a multiple nonlinear transformation composed of a plurality of layers process the data level of abstraction algorithm.

Depth learning method is characterized learning data based on machine learning.

Observed value (e.g., an image) may be used to represent a variety of ways, such as vector for each pixel intensity value, or more abstractly represented as a series edge region of a specific shape, and the like.

The use of certain learning tasks easier representation from the instance (e.g., facial expression recognition or face recognition).

Depth study of the benefits of learning and hierarchical feature extraction algorithm efficient alternative to manual acquisition features of the feature of unsupervised.

So far, several deep learning framework, such as the depth of neural networks , convolution neural network and the depth of belief networks and recurrent neural network has been the field of computer vision, speech recognition, natural language processing, audio recognition and bioinformatics and get a very Good results.

Neural network, a heuristic programming paradigm from the beautiful biology, can learn from the observed data

Image Classification

Image classification, the input image is assigned the task of a tag from a fixed set of classification . This is one of the core issues in computer vision, despite its simplicity, there are a variety of practical applications. Moreover, as we will see later, many other seemingly disparate computer vision tasks (such as object detection, segmentation) can be reduced to image classification.

Image classification model requires a single image, and probability is assigned to four labels {cat, dog, hat, mug}. As shown, remember that, for the computer, the image is represented as a large number of 3-dimensional array. In this example, the image is 248 pixels wide cat, 400 pixels high, and there are three color channels of red, green, blue (or simply RGB). Thus, the image consists of 248 x 400 x 3 digits, a total of 297,600 numbers. Each number is an integer ranging from 0 (black) to 255 (white). Our task is to turn this quarter's numbers into a single label, such as "cat."

 

Since the identification of visual concepts (such as cats) of this task for humans is relatively insignificant, so it's worth the challenge involved from the standpoint of computer vision algorithms.

Data-driven approach

How can we write an image can be classified into different categories of algorithms?

And write an algorithm (such as sorting a list of numbers) is different, how to write algorithms for identifying the image of a cat is not obvious.

Data collection is driven by mobile internet or other related software as a means of vast amounts of data, the data is organized form of information, after the relevant information integration and refining, after the data on the basis of training and form fitting automated decision-making model .

Artificial neural networks

Study neural networks had already appeared early today, "neural network" has a fairly large, multi-disciplinary subject areas. Neural network is the most basic component of neuron model.

 

 

Each circle is a figure above neurons, each line represents the connection between neurons. We can see that the top is divided into neurons neurons between layers, connecting layers, and between the inner layer of neurons is not connected.

sensor

To understand the neural network, we should first understand the composition of cells - neural networks neurons. Perceptron also called neurons. Remember the weight of the previous linear regression model right role in it? By weight of each input value and the corresponding weight and the sum of the product obtained data or function is performed by activating the discriminant. Here we look at Perceptron:

 

Can be seen, a perception has the following components:

  • Input weight, there may be a plurality of input perceptron:

  • There is a weight on each input:

     
  • Activation function, Perceptron's activation function has many options

     
     

     

    Wherein z is the weighted sum of products of data

  • Output:

     

我们了解过sigmoid函数是这样,在之前的线性回归中它对于 二类分类 问题非常擅长。所以在后续的多分类问题中,我们会用到其它的激活函数。

如果是高维空间中,感知器模型寻找的就是一个超平面,能够把所有的二元类别分割开。感知器模型的前提是:数据是线性可分的。

损失函数:

 

因为此时分子和分母中都包含了θ值,当分子扩大N倍的时候,分母也会随之扩大,也就是说分子和分母之间存在倍数关系,所以可以固定分子或者分母为1,然后求另一个即分子或者分母的倒数的最小化作为损失函数,简化后的损失函数为(分母为1):

 

然后使用梯度下降法对该损失函数求解,不过这里由于m是分类错误的样本集合,不是固定的,所以我们不能使用批量梯度下降法(BGD)求解,只能使用随机梯度下降或者小批量来做

 

感知机测试

神经网络定义

定义:在机器学习和认知科学领域,人工神经网络(artificialneural network,缩写ANN),简称神经网络(:neural network,缩写NN)或类神经网络,是一种模仿生物神经网络的结构和功能的计算模型,用于对函数进行估计或近似。

基础神经网络:单层感知器,线性神经网络,BP神经网络,Hopfield神经网络等

进阶神经网络:玻尔兹曼机,受限玻尔兹曼机,递归神经网络等

深度神经网络:深度置信网络,卷积神经网络,循环神经网络,LSTM网络等

 

杰弗里·埃弗里斯特·辛顿 (英语:GeoffreyEverest Hinton)(1947年12月6日-)是一位英国出生的计算机学家和心理学家,以其在神经网络方面的贡献闻名。辛顿是反向传播算法的发明人之一,也是深度学习的积极推动者。

那么我们继续往后看,神经网络是啥?

神经网络其实就是按照一定规则连接起来的多个神经元。

  • 输入向量的维度和输入层神经元个数相同

  • 每个连接都有权值

  • 第N层的神经元与第N-1层的所有神经元连接,也叫 全连接

  • 上图网络中最左边的层叫做输入层,负责接收输入数据;最右边的层叫输出层,可以有多个输出层。我们可以从这层获取神经网络输出数据。输入层和输出层之间的层叫做隐藏层,因为它们对于外部来说是不可见的。

  • 同一层的神经元之间没有连接

那么我们以下面的例子来看一看,图上已经标注了各种输入、权重信息。

 

对于每一个样本来说,我们可以得到输入值x_1,x_2,x_3,也就是节点1,2,3的输入值,那么对于隐层每一个神经元来说都对应有一个偏置项b,它和权重一起才是一个完整的线性组合

这样得出隐层的输出,也就是输出层的输入值.

矩阵表示

同样,对于输出层来说我们已经得到了隐层的值,可以通过同样的操作得到输出层的值。那么重要的一点是,分类问题的类别个数决定了你的输出层的神经元个数。

SoftMax回归

首先看公式:

 

 

损失函数:

下面是一个简单的两层神经网络的推导:

 

这里g(z)是一个激活函数,我们使用sigmoid函数:

 

分析两层的神经网络:.

前向传播:

  • 输入:

     
  • 隐藏:

     
  • 输出:

     

后向传播:

更新θ1和θ2:

 

神经网络的训练

我们可以说神经网络是一个模型,那么这些权值就是模型的参数,也就是模型要学习的东西。然而,一个神经网络的连接方式、网络的层数、每层的节点数这些参数,则不是学习出来的,而是人为事先设置的。对于这些人为设置的参数,我们称之为超参数。

前向传播

神经网络的训练类似于之前线性回归中的训练优化过程。前面我们已经提到过梯度下降的意义,我们可以分为这么几步:

  • 计算结果误差

  • 通过梯度下降找到误差最小

  • 更新权重以及偏置项

这样我们可以得出每一个参数在进行一次计算结果之后,通过特定的数学理论优化误差后会得出一个变化率α

反向传播

就是说通过误差最小得到新的权重等信息,然后更新整个网络参数。通常我们会指定学习的速率λ(超参数),通过 变化率和学习速率 率乘积,得出各个权重以及偏置项在一次训练之后变化多少,以提供给第二次训练使用。

tensorflow神经网络接口的实现

tf.train.GradientDescentOptimizer

在使用梯度下降时候,一般需要指定学习速率

tf.train.GradientDescentOptimizer(0.5)

方法

init

构造一个新的梯度下降优化器

__init__(
    learning_rate,
    use_locking=False,
    name='GradientDescent'
)
  • learning_rate tensor或者浮点值,用于学习速率

minimize

添加操作以更新最小化loss,这种方法简单结合调用compute_gradients()和 apply_gradients()(这两个方法也是梯度下降优化器的方法)。如果要在应用它们之前处理梯度,则调用compute_gradients()和apply_gradients()显式而不是使用此函数。

minimize(
    loss,
    global_step=None,
    var_list=None,
    gate_gradients=GATE_OP,
    aggregation_method=None,
    colocate_gradients_with_ops=False,
    name=None,
    grad_loss=None
)
  • loss 损失值,变量值

  • global_step 变量,在每次更新之后加1

Guess you like

Origin www.cnblogs.com/TimVerion/p/11248067.html