前言

这两个月真是突如其来的清闲……偶尔分配来个 Bug，但经常就很快搞定了。跟组长讨论了一下代码结构优化方面的问题，把之前加入的一套业务逻辑做了整体优化，然后又陷入 “闲” 者模式。
剩下的大多时间都是在学习学习，熟悉熟悉项目源码。现在主要在搞 MTK Camera Hal 层的东西，真是想吐槽一下，Mtk 的代码有很多冗余的部分，比如各种 CamAdapter，明明代码一样一样的，非要复制好几份出来，然后只是在 creatInstance 的时候区分一下……就不舍得提取一些状态之类的东西出来优化一下……
然后编译的时间经常要很久，sourceInsight 又时不时要同步信息，各个基线的项目每隔一两天也得更新一次……趁着这些时间，我就悄悄地继续投入到翻译社第二期活动里去了…..
值得一提的是，上期活动贡献排名前十，腾讯给我发来了一个狗年的公仔 “哈士企”……
第二期活动是 “探索AI技术，科技引领未来”，不得不说这正对我胃口，于是多领了几篇……
虽然总共翻译了十篇文章，但是由于有些文章内容其实比较一般，再一个就是像自然语言处理这方面的内容其实以前并没有经常看，所以对里面很多专业的描述表述不清楚，所以很多篇文章都只是通过，没被专栏采纳……
下面继续把被专栏采纳的几篇以中英对照的形式放出来：

版权相关

翻译人：StoneDemo，该成员来自云+社区翻译社
原文链接：Crash Course On Multi-Layer Perceptron Neural Networks
原文作者：Jason Brownlee

Crash Course On Multi-Layer Perceptron Neural Networks

题目：多层感知器神经网络速成课

Artificial neural networks are a fascinating area of study, although they can be intimidating when just getting started.
There are a lot of specialized terminology used when describing the data structures and algorithms used in the field.

人工神经网络是一个令人神往的研究领域，尽管当新手入门的时候它们可能会令人生畏。
我们在描述本领域使用的数据结构和算法时，会使用到大量的专业术语。

In this post you will get a crash course in the terminology and processes used in the field of multi-layer perceptron artificial neural networks. After reading this post you will know:

The building blocks of neural networks including neurons, weights and activation functions.

How the building blocks are used in layers to create networks.

How networks are trained from example data.

Let’s get started.

这篇文章则是针对多层感知器（Multi-layer Perceptron）神经网络领域中所使用的术语和流程的速成课程。阅读这篇文章后，你会学到：

神经网络的构建模块，包括神经元（Neurons）、权重（Weights）和激活函数（Activation functions）。
如何在层中使用各个构建模块来创建网络。
如何使用样例数据训练网络。

让我们开始速成之路吧！

这里写图片描述
（神经网络速成课，图片来自 Joe Stump，作者保有相应权利。）

Crash Course Overview

（速成课总览）

We are going to cover a lot of ground very quickly in this post. Here is an idea of what is ahead:

扫描二维码关注公众号，回复： 2093679 查看本文章

Multi-Layer Perceptrons.

Neurons, Weights and Activations.

Networks of Neurons.

Training Networks.

We will start off with an overview of multi-layer perceptrons.

在这篇文章中，我们将快速讨论很多方面的问题。接下来主要讲述的几大主题如下：

多层感知器。
神经元，权重与激活。
神经元构成的网络。
训练网络。

我们接下来先对多层感知器进行概述。

1. Multi-Layer Perceptrons

（多层感知器）

The field of artificial neural networks is often just called neural networks or multi-layer perceptrons after perhaps the most useful type of neural network. A perceptron is a single neuron model that was a precursor to larger neural networks.

人工神经网络领域经常被简称为神经网络或多层感知器，而后者也许是最有用的神经网络类型。一个感知器是单个神经元模型，它是更大型的神经网络的前身。

It is a field that investigates how simple models of biological brains can be used to solve difficult computational tasks like the predictive modeling tasks we see in machine learning. The goal is not to create realistic models of the brain, but instead to develop robust algorithms and data structures that we can use to model difficult problems.

这是一个研究如何使用简单的生物大脑模型来解决复杂计算任务的领域，如我们在机器学习中看到的预测建模任务。其目标不是创建真实的大脑模型，而是开发出可用于对复杂问题建模的高鲁棒性算法和数据结构。

The power of neural networks come from their ability to learn the representation in your training data and how to best relate it to the output variable that you want to predict. In this sense neural networks learn a mapping. Mathematically, they are capable of learning any mapping function and have been proven to be a universal approximation algorithm.

神经网络的力量源于他们从训练数据中学习表示的能力，以及将其与想要预测的输出变量以最好的方式关联起来的能力。在这个意义上，神经网络是在学习一个映射。从数学的角度来看，他们能够学习任意的映射函数，并且被证明了是一个通用的近似算法。

The predictive capability of neural networks comes from the hierarchical or multi-layered structure of the networks. The data structure can pick out (learn to represent) features at different scales or resolutions and combine them into higher-order features. For example from lines, to collections of lines to shapes.

神经网络的预测能力则源于网络的层次或多层结构。数据结构可以在不同尺度或解析度的特征中进行选择（学习表示），并将它们组合成更高阶的特征。例如从线条到线条集合，再到图形。

2.Neurons

（神经元）

The building block for neural networks are artificial neurons.
These are simple computational units that have weighted input signals and produce an output signal using an activation function.

神经网络的构建模块是人工神经元。
它们都是一些简单的具有加权输入信号的计算单元，并且使用激活函数产生输出信号。

@图. 一个简单的神经元模型
（图：一个简单的神经元模型）

2.1.Neuron Weights

（神经元权重）

You may be familiar with linear regression, in which case the weights on the inputs are very much like the coefficients used in a regression equation.

您可能熟悉线性回归，在这种情景下，输入的权重与回归方程中所使用的系数非常相似。

Like linear regression, each neuron also has a bias which can be thought of as an input that always has the value 1.0 and it too must be weighted.

就像线性回归一样，每个神经元也有一个偏差，这个偏差可认为是一个值永远为 1.0 的输入，它也必须加权。

For example, a neuron may have two inputs in which case it requires three weights. One for each input and one for the bias.

比如说，一个神经元可能有两个输入，而在这种情况下，它总共就需要三个权重。每个输入对应一个，偏差也对应一个。

Weights are often initialized to small random values, such as values in the range 0 to 0.3, although more complex initialization schemes can be used.

虽然可以使用更复杂的初始化方案，但是我们通常将权重初始化为一些小的随机值，例如 0 到 0.3 之间的某些值。

Like linear regression, larger weights indicate increased complexity and fragility. It is desirable to keep weights in the network small and regularization techniques can be used.

与线性回归相同，较大的权重表示复杂性和脆弱性也随之增加。我们希望保持网络中的权重是可取的，此时我们可以使用正则化技术。

2.2.Activation

（激活）

The weighted inputs are summed and passed through an activation function, sometimes called a transfer function.

对加权输入求和，并通过一个激活函数（有时称为传递函数）。

An activation function is a simple mapping of summed weighted input to the output of the neuron. It is called an activation function because it governs the threshold at which the neuron is activated and strength of the output signal.

激活函数是求和后的加权输入与神经元输出的简单映射。它控制着激活神经元的阈值以及输出信号的强度，因此被称为激活函数。

Historically simple step activation functions were used where if the summed input was above a threshold, for example 0.5, then the neuron would output a value of 1.0, otherwise it would output a 0.0.

在以往我们使用简单的步骤激活函数：对输入求和后，若所得结果高于阈值（例如 0.5），则神经元将输出值 1.0，否则输出 0.0。

Traditionally non-linear activation functions are used. This allows the network to combine the inputs in more complex ways and in turn provide a richer capability in the functions they can model. Non-linear functions like the logistic also called the sigmoid function were used that output a value between 0 and 1 with an s-shaped distribution, and the hyperbolic tangent function also called tanh that outputs the same distribution over the range -1 to +1.

习惯上我们常使用非线性激活函数。它允许网络以更复杂的方式将输入组合起来，进而在其可模拟的函数中提供更丰富的能力。诸如 logistic 的非线性函数也称为 sigmoid 函数，用于输出一个介于 0 和 1 之间的 s 形分布的值。双曲正切函数也称为tanh，它也输出 s 型分布的值，其范围是 -1 到 +1 。

More recently the rectifier activation function has been shown to provide better results.

最近，整流器激活函数已被证明可以提供更好的结果。

3.Networks of Neurons

（神经网络）

Neurons are arranged into networks of neurons.
A row of neurons is called a layer and one network can have multiple layers. The architecture of the neurons in the network is often called the network topology.

神经元通过排列则可成为神经网络。
一排神经元被称为一层，一个网络中可以有多层。网络中神经元的架构通常称为网络拓扑。

@图. 一个简单的网络模型
（图. 一个简单的网络模型）

3.1.Input or Visible Layers

（输入 [或可见] 层）

The bottom layer that takes input from your dataset is called the visible layer, because it is the exposed part of the network. Often a neural network is drawn with a visible layer with one neuron per input value or column in your dataset. These are not neurons as described above, but simply pass the input value though to the next layer.

最底下的那一层从数据集中获取输入，它被称为可见层，因为它是网络的暴露的部分。通常神经网络是用一个可见层绘制的，其中每个输入值或数据集中的一列对应一个神经元。这些则不是如上所述的神经元，它们只是将输入值传递给下一层。

3.2.Hidden Layers

（隐含层）

Layers after the input layer are called hidden layers because that are not directly exposed to the input. The simplest network structure is to have a single neuron in the hidden layer that directly outputs the value.

输入层之后的层被称为隐藏层，因为它们不直接暴露在输入中。最简单的网络结构，就是隐藏层中只有一个直接将输入的值输出的单个神经元。

Given increases in computing power and efficient libraries, very deep neural networks can be constructed. Deep learning can refer to having many hidden layers in your neural network. They are deep because they would have been unimaginably slow to train historically, but may take seconds or minutes to train using modern techniques and hardware.

考虑到运算能力和高效的库的增加，我们现在可以构建出非常深的神经网络。深度学习可以指在你的神经网络中有着诸多隐藏层。它们很深，因为在过去它们训练的速度巨慢无比，但现在我们使用现代技术与硬件进行训练，则可能只需要几秒或几分钟就能训练完毕。

3.3.Output Layer

（输出层）

The final hidden layer is called the output layer and it is responsible for outputting a value or vector of values that correspond to the format required for the problem.

输入层之后的层被称为隐藏层，因为它们不直接暴露在输入中。最简单的网络结构，就是隐藏层中只有一个直接将输入的值输出的单个神经元。

The choice of activation function in he output layer is strongly constrained by the type of problem that you are modeling. For example:

A regression problem may have a single output neuron and the neuron may have no activation function.

A binary classification problem may have a single output neuron and use a sigmoid activation function to output a value between 0 and 1 to represent the probability of predicting a value for the class 1. This can be turned into a crisp class value by using a threshold of 0.5 and snap values less than the threshold to 0 otherwise to 1.

A multi-class classification problem may have multiple neurons in the output layer, one for each class (e.g. three neurons for the three classes in the famous iris flowers classification problem). In this case a softmax activation function may be used to output a probability of the network predicting each of the class values. Selecting the output with the highest probability can be used to produce a crisp class classification value.

最后的隐藏层被称为输出层，它负责输出一个符合问题所需格式的值或向量值。

对输出层中激活函数的选择受到所建模的问题类型的强烈约束。比如说：

一个回归问题可能只有一个输出神经元，且这个神经元可能没有激活函数。
二元分类问题可能只有一个输出神经元，并使用一个 S 形激活函数来输出一个介于 0 和 1 之间的值，以表示预测一个值属于类 1 的概率。通过使用阈值 0.5，可以将其转化为明确的类别值，将小于阈值的值设定为 0，否则为 1。
多元分类问题可能在输出层有多个神经元，每个类对应一个神经元（例如着名的虹膜花 [Iris flowers] 分类问题中的三个神经元）。在这种情况下，我们可以使用 softmax 激活函数来输出网络对每个类别值的概率预测。我们可以选择具有最高概率的输出来明确类别分类值。

4.Training Networks

（训练网络）

Once configured, the neural network needs to be trained on your dataset.

一旦完成了配置，神经网络就需要在数据集上进行训练。

4.1.Data Preparation

（准备数据）

You must first prepare your data for training on a neural network.

您必须先准备好您的数据，以便在神经网络上进行训练。

Data must be numerical, for example real values. If you have categorical data, such as a sex attribute with the values “male” and “female”, you can convert it to a real-valued representation called a one hot encoding. This is where one new column is added for each class value (two columns in the case of sex of male and female) and a 0 or 1 is added for each row depending on the class value for that row.

数据必须是数值，例如实际值。如果您有分类数据，例如具有 “男性” 和 “女性” 二值的性别属性，则可以将其转换成被称为一位有效编码（One hot encoding）的实值表示。这是为每个类别值添加一个新列（在男性和女性的情况下共添加两列），并且每行根据具体的类别值来添加 0 或 1。

This same one hot encoding can be used on the output variable in classification problems with more than one class. This would create a binary vector from a single column that would be easy to directly compare to the output of the neuron in the network’s output layer, that as described above, would output one value for each class.

对于不止一个类别的分类问题，可以在输出变量上使用相同的一位有效编码。这将从单个列创建一个二进制向量，它可以很容易地与网络输出层中神经元的输出进行直接比较，并且如上所述为每个类输出一个值。

Neural networks require the input to be scaled in a consistent way. You can rescale it to the range between 0 and 1 called normalization. Another popular technique is to standardize it so that the distribution of each column has the mean of zero and the standard deviation of 1.

神经网络要求以一致的方式对输入进行缩放（Scale）。您可以将其重新调整到 0 和 1 的范围，这样的操作也被称为标准化。另一个主流的标准化技术是，使每个列的分布的均值为 0，标准差为 1。

Scaling also applies to image pixel data. Data such as words can be converted to integers, such as the popularity rank of the word in the dataset and other encoding techniques.

缩放同样适用于图像像素数据。诸如单词之类的数据可以被转换为整数，诸如数据集中的单词的流行程度以及其他编码技术。

4.2.Stochastic Gradient Descent

（随机梯度下降法）

The classical and still preferred training algorithm for neural networks is called stochastic gradient descent.

一个经典的，并仍然在神经网络的训练中被优先考虑的算法，就是随机梯度下降法（Stochastic Gradient Descent）。

This is where one row of data is exposed to the network at a time as input. The network processes the input upward activating neurons as it goes to finally produce an output value. This is called a forward pass on the network. It is the type of pass that is also used after the network is trained in order to make predictions on new data.

当一行数据作为输入暴露给网络时，网络对输入进行处理以自下向上地激活神经元，最终产生一个输出值。这在网络中称为正向传递。这是在网络训练完成之后，对新数据进行预测时也会使用的传递类型。

The output of the network is compared to the expected output and an error is calculated. This error is then propagated back through the network, one layer at a time, and the weights are updated according to the amount that they contributed to the error. This clever bit of math is called the backpropagation algorithm.

网络的输出与预期的输出进行比较，并计算误差。这个误差则通过网络反向传播回来，一次一层，相应的权重则根据它们所导致的误差总数进行更新。这个巧妙的数学运算称为反向传播算法。

The process is repeated for all of the examples in your training data. One of updating the network for the entire training dataset is called an epoch. A network may be trained for tens, hundreds or many thousands of epochs.

对训练数据中的所有样本都重复该过程。通过整个训练数据集的对网络进行的一次更新称为一次迭代（Epoch）。一个网络可以进行几十，几百或几千次这样的迭代训练。

4.3.Weight Updates

（更新权重）

The weights in the network can be updated from the errors calculated for each training example and this is called online learning. It can result in fast but also chaotic changes to the network.

网络中的权重可以根据针对每个训练样本而计算出来的误差进行更新，我们将此称为在线学习。它可能导致网络快速且混乱地进行变化。

Alternatively, the errors can be saved up across all of the training examples and the network can be updated at the end. This is called batch learning and is often more stable.

另一种方法是，对于所有训练样本，将误差值保存起来，最后再用于更新网络。这称为批量学习，它通常更稳定。

Typically, because datasets are so large and because of computational efficiencies, the size of the batch, the number of examples the network is shown before an update is often reduced to a small number, such as tens or hundreds of examples.

通常情况下，由于数据集十分巨大，并且受计算效率，批大小（Batch size）的影响，更新之前的网络训练样本的数量往往会减少到一个比较小的规模，例如几十或几百个样本。

The amount that weights are updated is controlled by a configuration parameters called the learning rate. It is also called the step size and controls the step or change made to network weight for a given error. Often small weight sizes are used such as 0.1 or 0.01 or smaller.

权重更新的数量是由一个称为学习率（Learning rate）的配置参数所控制的。它也被称为步长（Step size），并且它控制着对于给定误差的网络权重的步骤或更改。通常我们使用较小的权值，例如 0.1，0.01 或者更小的值。

The update equation can be complemented with additional configuration terms that you can set.

Momentum is a term that incorporates the properties from the previous weight update to allow the weights to continue to change in the same direction even when there is less error being calculated.

Learning Rate Decay is used to decrease the learning rate over epochs to allow the network to make large changes to the weights at the beginning and smaller fine tuning changes later in the training schedule.

权重更新公式可以补充一些您可以设置的附加配置项。

动量（Momentum）是一个术语，它包含了来自之前的权重更新的特性，即使我们计算出的误差很小，权重也可以继续在同一方向上继续变化。
学习速率衰减（Decay）用于降低迭代（Epoch）的学习速率，以使得网络能够在开始时对权重进行较大修改，而在训练进度后期进行较小的微调。

4.4.Prediction

（预测）

Once a neural network has been trained it can be used to make predictions.

一旦神经网络训练完成，它就可以用来做出预测。

You can make predictions on test or validation data in order to estimate the skill of the model on unseen data. You can also deploy it operationally and use it to make predictions continuously.

您可以对测试数据或验证数据进行预测，从而估计出模型对于未知数据的预测能力。您也可以部署它，并使用它来持续进行预测。

The network topology and the final set of weights is all that you need to save from the model. Predictions are made by providing the input to the network and performing a forward-pass allowing it to generate an output that you can use as a prediction.

网络拓扑结构和最终权重集就是所有您需要从模型中保存的内容。预测，则是通过向网络提供输入并执行前向传递，从而生成一个可用于预测的输出。

More Resources

（更多资源）

There are decades of papers and books on the topic of artificial neural networks.

几十年来有许多关于人工神经网络的论文和书籍出版。

If you are new to the field I recommend the following resources as further reading:

Neural Smithing: Supervised Learning in Feedforward Artificial Neural Networks

Neural Networks for Pattern Recognition

An Introduction to Neural Networks

如果你是该领域的新手，我建议你可以进一步阅读下列资源：

Neural Smithing: Supervised Learning in Feedforward Artificial Neural Networks（神经锻造:前馈人工神经网络中的监督学习）
Neural Networks for Pattern Recognition（用于模式识别的神经网络）
An Introduction to Neural Networks（神经网络导论）

Summary

（总结）

In this post you discovered artificial neural networks for machine learning.

在这篇文章中，您了解到了关于机器学习的人工神经网络相关的知识。

After reading this post you now know:

How neural networks are not models of the brain but are instead computational models for solving complex machine learning problems.

That neural networks are comprised of neurons that have weights and activation functions.

The networks are organized into layers of neurons and are trained using stochastic gradient descent.

That it is a good idea to prepare your data before training a neural network model.

阅读这篇文章后，您学到了：

神经网络并非大脑的模型，而是用于解决复杂机器学习问题的计算模型。
神经网络是由带权重和激活功能的神经元组成的。
网络被组织成神经元层的形式，并且我们使用随机梯度下降法对其进行训练。
在训练神经网络模型之前准备好您的数据，这是个好主意。

[AI 技术文章之其一] 多层感知器神经网络速成课

前言