前言

这两个月真是突如其来的清闲……偶尔分配来个 Bug，但经常就很快搞定了。跟组长讨论了一下代码结构优化方面的问题，把之前加入的一套业务逻辑做了整体优化，然后又陷入 “闲” 者模式。
剩下的大多时间都是在学习学习，熟悉熟悉项目源码。现在主要在搞 MTK Camera Hal 层的东西，真是想吐槽一下，Mtk 的代码有很多冗余的部分，比如各种 CamAdapter，明明代码一样一样的，非要复制好几份出来，然后只是在 creatInstance 的时候区分一下……就不舍得提取一些状态之类的东西出来优化一下……
然后编译的时间经常要很久，sourceInsight 又时不时要同步信息，各个基线的项目每隔一两天也得更新一次……趁着这些时间，我就悄悄地继续投入到翻译社第二期活动里去了…..
值得一提的是，上期活动贡献排名前十，腾讯给我发来了一个狗年的公仔 “哈士企”……
第二期活动是 “探索AI技术，科技引领未来”，不得不说这正对我胃口，于是多领了几篇……
虽然总共翻译了十篇文章，但是由于有些文章内容其实比较一般，再一个就是像自然语言处理这方面的内容其实以前并没有经常看，所以对里面很多专业的描述表述不清楚，所以很多篇文章都只是通过，没被专栏采纳……
下面继续把被专栏采纳的几篇以中英对照的形式放出来：

版权相关

翻译人：StoneDemo，该成员来自云+社区翻译社
原文链接：Primer on Neural Network Models for Natural Language Processing
原文作者：Jason Brownlee

Primer on Neural Network Models for Natural Language Processing

题目：自然语言处理的神经网络模型初探

Deep learning is having a large impact on the field of natural language processing.
But, as a beginner, where do you start?

深度学习（Deep Learning）技术对自然语言处理（NLP，Natural Language Processing）领域有着巨大的影响。
但作为初学者，您要从何处开始学习呢？

Both deep learning and natural language processing are huge fields. What are the salient aspects of each field to focus on and which areas of NLP is deep learning having the most impact?

深度学习和自然语言处理都是较为广阔的领域，但每个领域重点研究些什么？在自然语言处理领域中，又是哪一方面最受深度学习的影响呢？

In this post, you will discover a primer on deep learning for natural language processing.

通过阅读本文，您会对自然语言处理中的深度学习有一个初步的认识。

After reading this post, you will know:

The neural network architectures that are having the biggest impact on the field of natural language processing.

A broad view of the natural language processing tasks that can be successfully addressed with deep learning.

The importance of dense word representations and the methods that can be used to learn them.

Let’s get started.

阅读这篇文章后，您可以知道：

对自然语言处理领域影响最为深远的神经网络结构。
综观那些可以通过深度学习成功解决的自然语言处理任务。
密集词表示（Dense word representations）的重要性以及可以用于学习它们的方法。

现在，让我们开始本次学习之旅。
这里写图片描述
（自然语言处理的神经网络模型入门，图片作者 faungg ，部分版权保留。）

Overview

（概览）

This post is divided into 12 sections that follow the structure of the paper; they are:

About the Paper (Introduction)

Neural Network Architectures

Feature Representation

Feed-Forward Neural Networks

Word Embeddings

Neural Network Training

Cascading and Multi-Task Learning

Structured Output Prediction

Convolutional Layers

Recurrent Neural Networks

Concrete RNN Architectures

Modeling Trees

本文将遵循相关论文的结构而分为 12 个部分，分别是：

关于论文（简介）
神经网络架构
特征表示
前馈神经网络
词嵌入
训练神经网络
级联和多任务学习
结构化输出预测
卷积层
循环神经网络
循环神经网络的具体架构
树型建模

I want to give you a flavor of the main sections and style of this paper as well as a high-level introduction to the topic.

我想给大家介绍一下本文的主要部分和风格，以及对高层次的话题介绍。

If you want to go deeper, I highly recommend reading the paper in full, or the more recent book.

如果你想继续深入研究，我强烈推荐阅读全文或者一些最近出版的的书。

1. About the Paper

（关于论文）

The title of the paper is: “A Primer on Neural Network Models for Natural Language Processing“.

论文的题目是：“A Primer on Neural Network Models for Natural Language Processing ” （自然语言处理的神经网络模型入门）。

It is available for free on ArXiv and was last dated 2015. It is a technical report or tutorial more than a paper and provides a comprehensive introduction to Deep Learning methods for Natural Language Processing (NLP), intended for researchers and students.

这篇论文可以免费在 ArXiv 上获取，最新一次提交则是在 2015 年。它不只是一篇论文，更像是一篇技术报告或教程，并且文中还提供了针对学生与研究人员的，关于自然语言处理（NLP）中的深度学习方法的比较全面的介绍。

This tutorial surveys neural network models from the perspective of natural language processing research, in an attempt to bring natural-language researchers up to speed with the neural techniques.

[ 本教程从自然语言处理研究的角度对神经网络模型进行了相关研究，力图令自然语言领域的研究人员能跟上神经网络技术的发展速度。 ]

The primer was written by Yoav Goldberg who is a researcher in the field of NLP and who has worked as a research scientist at Google Research. Yoav caused some controversy recently, but I wouldn’t hold that against him.

这篇入门论文是由 NLP 领域研究员 Yoav Goldberg 撰写的，他曾在 Google Research 担任研究科学家。虽然 Yoav 最近引起了一些争议，但我不会因此反对他。

It is a technical report and is about 62 pages and has about 13 pages of references.

这是一份技术报告，大概共有 62 页，其中约有 13 页是参考文献列表。

The paper is ideal for beginners for two reasons:

It assumes little about the reader, other than you are interested in this topic and you know a little machine learning and/or natural language processing.

It has great breadth, covering a wide range of deep learning methods and natural language problems.

这篇文章非常适合初学者，其原因有二：

它对于读者的要求并不高，只需要您对这一主题有一定的兴趣，并且了解少数关于机器学习与（或者）自然语言处理相关的知识即可。
它涵盖了广泛的深度学习方法和自然语言问题。

In this tutorial I attempt to provide NLP practitioners (as well as newcomers) with the basic background, jargon, tools and methodology that will allow them to understand the principles behind the neural network models and apply them to their own work. … it is aimed at those readers who are interested in taking the existing, useful technology and applying it in useful and creative ways to their favourite NLP problems.

[ 在本教程中，我尝试给 NLP 从业人员（以及新人）提供基本的背景知识，术语，工具和方法，使他们能够理解神经网络模型背后的原理，并将其应用到自己的工作中。 … 本文的受众，是那些有兴趣使用现存的有用技术，并以实用且富有创造性的方式将其应用到他们最喜欢的 NLP 问题中的读者。 ]
Often, key deep learning methods are re-cast using the terminology or nomenclature of linguistics or natural language processing, providing a useful bridge.

通常，关键的深度学习方法通过语言学或自然语言处理的术语或命名法重新建立，这（在深度学习与自然语言处理之间）提供了一个有用的桥梁。

Finally, this 2015 primer has been turned into a book published in 2017, titled “Neural Network Methods for Natural Language Processing“.

最后值得一提的是，这篇 2015 年的入门教程已在 2017 年出版，名为 “Neural Network Methods for Natural Language Processing” （自然语言处理中的神经网络方法）。

这里写图片描述

If you like this primer and want to go deeper, I highly recommend Yoav’s book.

如果你喜欢这篇入门教程并且想深入研究，我强烈推荐您继续阅读 Yoav 的这本书。

2. Neural Network Architectures

（神经网络架构）

This short section provides an introduction to the different types of neural network architectures with cross-references into later sections.

本小节简要介绍了各种不同类型的神经网络架构，在后面的章节中对它们进行了一些交叉引用。

Fully connected feed-forward neural networks are non-linear learners that can, for the most part, be used as a drop-in replacement wherever a linear learner is used.

[ 全连接（Fully connected）前馈神经网络是非线性学习器，在大多数情况下，它可以替换到使用了线性学习器的任何地方。 ]
A total of 4 types of neural network architectures are covered, highlighting examples of applications and references of each:

Fully connected feed-forward neural networks, e.g. multilayer Perceptron networks.

Networks with convolutional and pooling layers, e.g. convolutional neural networks.

Recurrent Neural Networks, e.g. long short-term memory networks.

Recursive Neural Networks.

小节内容涵盖了四种神经网络架构，并重点介绍了各种应用和引用的例子：

全连接前馈神经网络，如多层感知器网络（Multilayer Perceptron Networks）。
具有卷积和池化层（Pooling Layers）的网络，如卷积神经网络（Convolutional Neural Network）。
递归神经网络（Recurrent Neural Networks），如长短期记忆（LSTM，Long Short Term Memory）网络。
循环神经网络（Recursive Neural Networks）。

This section provides a great source if you are only interested in applications for a specific network type and want to go straight to the source papers.

如果您只对其中一种特定网络类型的应用感兴趣，并想直接阅读相关文献，本节则提供了一些很好的来源。

3. Feature Representation

（特征表示）

This section focuses on the use of transitioning from sparse to dense representations that can, in turn, be trained along with the deep learning models.

本节重点介绍了如何将稀疏表示过渡转化为密集表示，然后再运用到深度学习模型训练中。

Perhaps the biggest jump when moving from sparse-input linear models to neural-network based models is to stop representing each feature as a unique dimension (the so called one-hot representation) and representing them instead as dense vectors.

{ 当把输入的稀疏线性模型转变为基于神经网络的模型时，最大的变化大概就是不再将每个特征表示为一个唯一的维度（所谓的单一表示 [One-hot Representation]），而是将它们表示为密集向量（Dense Vector）。 }
A general structure of NLP classification systems is presented, summarized as:

Extract a set of core linguistic features.

Retrieve the corresponding vector for each vector.

Combine the feature vectors.

Feed the combined vectors into a non-linear classifier.

本节中介绍了 NLP 分类系统的一般结构，可总结如下：

提取一组核心语言特征。
为每个向量检索对应的向量。
组合成为特征向量。
将组合的矢量馈送到一个非线性分类器中。

The key to this formulation are the dense rather than sparse feature vectors and the use of core features rather than feature combinations.

这个公式的关键在于使用了密集特征向量而不是稀疏特征向量，并且用的是核心特征而非特征组合。

Note that the feature extraction stage in the neural-network settings deals only with extraction of core features. This is in contrast to the traditional linear-model-based NLP systems in which the feature designer had to manually specify not only the core features of interests but also interactions between them

[ 请注意，在神经网络设置中的特征提取阶段，仅仅处理核心特征的提取。这与传统的基于线性模型的 NLP 系统大相径庭，因为在该系统中，特征设计者不仅必须手动地指定感兴趣的核心特征，而且还需要手动指定它们之间的相互作用。 ]

4. Feed-Forward Neural Networks

（前馈神经网络）

This section provides a crash course on feed-forward artificial neural networks.

本节是前馈人工神经网络的速成课。

（带有两个隐藏层的前馈神经网络，摘自 “A Primer on Neural Network Models for Natural Language Processing”。）

Networks are presented both using a brain-inspired metaphor and using mathematical notation. Common neural network topics are covered such as:

Representation Power (e.g. universal approximation).

Common Non-linearities (e.g. transfer functions).

Output Transformations (e.g. softmax).

Word Embeddings (e.g. built-in learned dense representation).

Loss Functions (e.g. hinge and log loss).

网络是通过大脑启发的隐喻与数学符号来呈现的。常见的神经网络主题包括如下几种：

表示能力（例如通用逼近性 [Universal approximation]）。
常见的非线性关系（例如传递函数）。
输出变换（例如 softmax）。
词嵌入（例如内置的学习密集表示）。
损失函数（如 Hinge-loss 和对数损失）。

5. Word Embeddings

（词嵌入）

The topic of word embedding representations is key to the neural network approach in natural language processing. This section expands upon the topic and enumerates the key methods.

在自然语言处理中，词嵌入表示（Word Embedding Representations）是神经网络方法的关键部分。本节则扩展了这个主题，并列举了一些关键的方法。

A main component of the neural-network approach is the use of embeddings — representing each feature as a vector in a low dimensional space

[ 神经网络方法中的一个主要组成部分是使用嵌入 - 将每个特征表示为低维空间中的向量 ]
The following word embedding topics are reviewed:

Random Initialization (e.g. starting with uniformed random vectors).

Supervised Task-specific Pre-training (e.g. transfer learning).

Unsupervised Pre-training (e.g. statistical methods like word2vec and GloVe).

Training Objectives (e.g. the influence of the objective on the resulting vectors).

The Choice of Contexts (e.g. influence of the words around each word).

本节中介绍了关于词嵌入的以下几个主题：

随机初始化（例如，从统一的随机向量开始训练）。
特定的有监督任务的预训练（例如，迁移学习 [Transfer Learning]）。
无监督任务的预训练（例如，word2vec 与 GloVe 之类的统计学方法）。
训练目标（例如，目标对结果向量的影响）。
上下文的选择（例如，每个单词受到附近的单词的影响）。

Neural word embeddings originated from the world of language modeling, in which a network is trained to predict the next word based on a sequence of preceding words

神经词嵌入起源于语言建模领域，其中训练所得的网络则用于基于先前词的序列来预测下一个词。

6. Neural Network Training

（训练神经网络）

This longer section focuses on how neural networks are trained, written for those new to the neural network paradigm.

这个较长的章节是为神经网络新手而写的，它着重于训练神经网络的具体步骤。

Neural network training is done by trying to minimize a loss function over a training set, using a gradient-based method.

[ 神经网络的训练，是通过运用基于梯度的方法将训练集上的损失函数最小化来完成的。 ]
The section focuses on stochastic gradient descent (and friends like mini-batch) as well as important topics during training like regularization.

本节重点介绍随机梯度下降法（还有相似的如 Mini-batch 这样的方法）以及训练过程中的一些重要主题，比如说正则化。

Interesting, the computational graph perspective of neural networks is presented, providing a primer for symbolic numerical libraries like Theano and TensorFlow that are popular foundations for implementing deep learning models.

有趣的是，本节还提供了神经网络的计算图形透视图，为诸如 Theano 和 TensorFlow 这样的符号化数值计算库提供了一个引子，而这些库则是当前流行的用于实现深度学习模型的基础。

Once the graph is built, it is straightforward to run either a forward computation (compute the result of the computation) or a backward computation (computing the gradients)

[ 一旦图形被构建，就可以直接运行正向计算（计算计算结果）或者反向计算（计算梯度） ]

7. Cascading and Multi-Task Learning

（级联和多任务学习）

This section builds upon the previous section by summarizing work for cascading NLP models and models for learning across multiple language tasks.

在前一节的基础上，本节总结了级联 NLP 模型和多语言任务学习模型的作用。

Model cascading: Exploits the computational graph definition of neural network models to leverage intermediate representations (encoding) to develop more sophisticated models.

For example, we may have a feed-forward network for predicting the part of speech of a word based on its neighbouring words and/or the characters that compose it.

Multi-task learning: Where there are related natural language prediction tasks that do not feed into one another, but information can be shared across tasks.

Information for predicting chunk boundaries, named-entity boundaries and the next word in the sentence all rely on some shared underlying syntactic-semantic representation

Both of these advanced concepts are described in the context of neural networks that permit both the connectivity between models or information both during training (backpropagation of errors) and making predictions.

级联模型（Model cascading）：利用神经网络模型计算图的定义来使用中间表示（编码）开发更复杂的模型。
（以下为对引用的翻译）

例如，我们可能有一个前馈网络，它用于根据词的相邻词和（或）构成它的字符来预测词的词性。

多任务学习（Multi-task learning）：有一些相互关联的自然语言预测任务，它们不会相互影响，但它们各自的信息可以跨任务共享。
（以下为对引用的翻译）

用于预测块边界、命名实体边界和句子中的下一个单词的信息，都依赖于一些共享的基础句法语义表示

这两个先进的概念都是在神经网络的背景下描述的，它允许模型或信息在训练（误差反向传播）和预测期间具有连通性。

8. Structured Output Prediction

（结构化输出预测）

This section is concerned with examples of natural language tasks where deep learning methods are used to make structured predictions such as sequences, trees and graphs.

Canonical examples are sequence tagging (e.g. part-of-speech tagging) sequence segmentation (chunking, NER), and syntactic parsing.

This section covers both greedy and search-based structured prediction, with focus on the latter.

The common approach to predicting natural language structures is search based.

本节关注的是使用深度学习方法进行结构化预测的自然语言任务，比如说序列、树，以及图。
（以下为对引用的翻译）

典型的例子是序列标记（例如词性标注 [Part-of-speech tagging]），序列分割（分块，NER [Named-entity Recognition，命名实体识别]）以及句法分析。

本部分涵盖了基于贪心思想和基于搜索的结构化预测，重点关注后者。
（以下为对引用的翻译）

常用的自然语言结构化预测方法，是基于搜索的方法。

9. Convolutional Layers

（卷积层）

This section providers a crash course on Convolutional Neural Networks (CNNs) and their impact on natural language.

本节提供了卷积神经网络（CNN，Convolutional Neural Networks）的速成课程，以及阐述了这一网络对自然语言领域的影响。

Notably, CNNs have proven very effective for classification NLP tasks like sentiment analysis, e.g. learning to look for specific subsequences or structures in text in order to make predictions.

值得注意的是，当下已经证明了 CNN 对诸如情感分析（Sentiment analysis）这样的分类 NLP 任务非常有效，例如学习寻找文本中的特定子序列或结构以进行预测。

A convolutional neural network is designed to identify indicative local predictors in a large structure, and combine them to produce a fixed size vector representation of the structure, capturing these local aspects that are most informative for the prediction task at hand.

[ 卷积神经网络被设计来识别大型结构中的指示性局部预测因子（Indicative local predictors），并且将它们组合起来以产生结构的固定大小的向量表示，从而捕获这些对于预测任务而言最具信息性的局部方面（Local aspects）。 ]

10. Recurrent Neural Networks

（循环神经网络）

As with the previous section, this section focuses on the use of a specific type of network and its role and application in NLP. In this case, Recurrent Neural Networks (RNNs) for modeling sequences.

与前一节一样，本节重点介绍了在 NLP 中所使用的特定网络及其作用与应用。在 NLP 中，递归神经网络（RNN，Recurrent Neural Networks）用于序列建模。

Recurrent neural networks (RNNs) allow representing arbitrarily sized structured inputs in a fixed-size vector, while paying attention to the structured properties of the input.
[ 递归神经网络（RNN）允许在固定大小的向量中表示任意大小的结构化输入，同时也会注意输入的结构化属性。 ]
Given the popularity of RNNs and specifically the Long Short-Term Memory (LSTM) in NLP, this larger section works through a variety of recurrent topics and models, including:

The RNN Abstraction (e.g. recurrent connections in the network graph).

RNN Training (e.g. backpropagation through time).

Multi-layer (stacked) RNNs (e.g. the “deep” part of deep learning).

BI-RNN (e.g. providing sequences forwards and backwards as input).

RNNs for Representing Stacks

考虑到 RNN，特别是 NLP 中的长短期记忆（LSTM）的普及，这个较大的章节介绍了各种关于循环神经网络的主题与模型，其中包括：

RNN 的抽象概念（例如网络图中的循环连接）。
RNN 训练（例如通过时间进行反向传播）。
多层（堆叠）RNN（例如深度学习的 “深度” 部分）。
BI-RNN（例如前向和反向序列作为输入）。
用于表示的 RNN 堆叠。

Time is spent on the RNN model architectures or architectural elements, specifically:

Acceptor: the loss calculated on output after complete input sequence.

Encoder: the final vector is used as an encoding of the input sequence.

Transducer: one output is created for each observation in the input sequence.

Encoder-Decoder: the input sequence is encoded to a fixed-length vector before being decoded to an output sequence.

我们将在 RNN 模型结构或结构元素上花费一定的时间，特别是：

接受器（Acceptor）：完整的序列输入后，它计算输出的损失。
编码器（Encoder）：最终向量用作输入序列的编码器。
转换器（Transducer）：为输入序列中的每个观测对象创建一个输出。
编码器 - 解码器（Encoder-Decoder）：输入序列在被解码为输出序列之前，会编码成为固定长度的向量。

11. Concrete RNN Architectures

（循环神经网络的具体架构）

This section builds on the previous by presenting specific RNN algorithms.

本章节基于上一节的内容，介绍了具体的 RNN 算法。

Specifically covered are:

Simple RNN (SRNN).

Long Short-Term Memory (LSTM).

Gated Recurrent Unit (GRU).

具体包括如下几点：

简单的 RNN（SRNN）。
长短期记忆（LSTM）。
门控循环单元（GRU，Gated Recurrent Unit）。

12. Modeling Trees

（树型建模）

This final section focuses on a more complex type of network called the Recursive Neural Network for learning model trees.

最后一节则重点关注一个更复杂的网络，我们称为学习树型建模的递归神经网络。

The trees can be syntactic trees, discourse trees, or even trees representing the sentiment expressed by various parts of a sentence. We may want to predict values based on specific tree nodes, predict values based on the root nodes, or assign a quality score to a complete tree or part of a tree.

树，可以是句法树，话语树，甚至是由一个句子中各个部分所表达的情绪的树。我们希望基于特定的树节点或基于根节点来预测值，或者为完整的树或树的一部分指定一个质量值

As recurrent neural networks maintain state about the input sequences, recursive neural networks maintain state about nodes in the trees.

由于递归神经网络保留了输入序列的状态，所以递归神经网络会维持树中节点的状态。

这里写图片描述
（递归神经网络的例子，摘自 “A Primer on Neural Network Models for Natural Language Processing”。）

Summary

（总结）

In this post, you discovered a primer on deep learning for natural language processing.

这篇文章介绍了一些关于自然语言处理中的深度学习的入门知识。

Specifically, you learned:

The neural network architectures that are having the biggest impact on the field of natural language processing.

A broad view of the natural language processing tasks that are can be successfully addressed with deep learning.

The importance of dense word representations and the methods that can be used to learn them.

具体来说，你学到了：

对自然语言处理领域影响最大的神经网络结构。
对可以通过深度学习算法成功解决的自然语言处理任务有一个广泛的认识。
密集表示以及相应的学习方法的重要性。

[AI 技术文章之其二] 自然语言处理的神经网络模型初探

前言