深度学习论文语言和表述

Vector space 向量空间

node classification节点分类

link prediction边预测

community detection社群检测

case study 小样本任务,案例分析(比如77个点 254条边)

word representations 词表示

language model 语言模型 :预测句子在语言出现的概率

One-hot representation:独热编码 最简单的词向量,用数字表示词

        语义鸿沟问题  词与词的相似度没办法求
        维数灾难、稀疏
        无法表示未出现的词汇

Distributed Word Representation 分布式表示:词的含义相近,距离也较近

不同语言构建的词向量空间,对于语义相近的词,在向量空间的位置相似,说明词向量的构造与语言的表示形式无关 

data arguements 数据增强

backpropagtion through time 通过时间反向传播

gradient vanishing :梯度消失 训练过程非常慢

word hasing          词哈希技术

hidden markov model 隐马尔可夫模型

initial probabilities初始隐状态概率

conditional random field 条件随机场

characteristic function特征函数

node2vec:scalable feature learning for networks 大规模网络节点的表征学习

downstreaming tasks 下游任务

first-order proximity一阶相似度:节点间的局部相似度,不足以表征整个网络结构

second-order proximity 二阶相似度

word analogy 单词推理

document classification文本分类

node classification 节点分类

visualization 可视化

语料库

研究背景:

许多实际应用场景中的数据是从非欧式空间生成的,如何将深度学习方法应用在图数据难度大,主要难点在于图的不规则性(无序节点、邻居数量不同)
应用领域广泛:电子商务、金融风控、推荐系统

描述任务:

Sequence tagging including part of speech tagging (POS), chunking, and named entity recogni-tion (NER) has been a classic NLP task.  序列标注问题包括词性标注,命名实体识别,它是个经典的NLP任务

given a network/graph G=(V,E,W),where V is the set of nodes,E is the set of edges between the nodes,and W is the set of weights of the edges,the goal of node embedding is to represent each node i with a vecctor ,which preserves the structure of networks.

结点编码是通过向量表达每个节点,且隐含网络的结构信息

 This paper studies the problem of embedding very large information networks into low-dimensional vector spaces, which is useful in many tasks such as visualization, node classification, and link prediction. 

这篇论文研究如何将大规模信息网络用低位向量空间表示,这在很多任务非常有用

前人工作综述

Most existing sequence tagging models are linear statistical models which include Hidden Markov Models (HMM), Maximum entropy Markoy models (MEMMs) McCallum et al.
2000). and Conditional Random Fields (CRF)

大多数模型是线性模型包括HMM,MEMMs和CRF

Many current NLP systems and techniques treat words as atomic units - there is no notion of similarity berween words, as these are represented as indices in a vocabulary, This choice has several good reasons - simplicity robustness and the observation that simple models trained on huge amounts of data outperform/Complex systems trained on less data An example is the popular N-gram model used for statistical language modeling - today, it is possible to train N-grams on virtually all available data trillions of words 

传统nlp将词作为最小单元处理 没有词相似度的概念 仅作为词表的索引 。该模型简单鲁棒,这种简单模型在大的数据集比复杂模型在小的数据集性能好。N-grams可以用在非常大的语料上得到很好的模型

However, the simple techniques are at their limits in many. tasks. For example, the amount of relevant in-domain data for automatic speech recognition is limited -the performance is usuallydominated by the size of high quality transcribed speech data (often just millions of words). In machine translation, the existing corpora for many languages contain only a few billions of words or less. Thus, there are situations where simple scaling up of the basic techniques will not result in any significant progress, and we have to focus on more advanced techniques.

这种模型在语音识别和机器翻译等小数据集上局限

With progress of machine learning techniques in recent years, it has become possible to train more complex models on the much larger dataset, and they especially outperform simple models. Probably the most successful concept is to use distributed representations of words 10. For example. neural network based Language models-sigmificantly outperform N-gram models.

随着机器学习的发展 可以训练更复杂的模型在更大的数据集上 

传统的图学习领域:人工提取特征然后进行特征筛选,然后输入分类器

基于cnn,textcnn,gnn多模态学习

Prediction tasks over nodes and edges in networks require careful effort in engineering features used by learning algorithms. Recent research in the broader field of representation learning has led to significant progress in automating prediction by learning the features themselves. However, present feature learning approaches are not expressive enough to capture the diversity of connectivity patterns observed in networks.

节点预测和边的预测需要仔细的特征工程。最近的研究希望自监督学习。然而现有的方法节点表征能力是不够的

描述模型的性质:

We show that the Bl-LSTM-CRF model can efficiently use both past and future input features thanks to a bidirectional LSTM component. It can also use sentence level tag inforpation thanks to a CRF layer.

双向LSTM可以综合利用过去和未来的特征

In addition, it is robust and has less dependence on word embeding as compared to previous observations.

BILSTM-CRF模型效果好、鲁棒性强、对词向量依赖不强

描述模型局限:

Latent semantic models, such as LSA, intend to map a query to its relevant documents at the semantic level where keyword-based matching often fails.

像LSA那样利用关键词匹配将查询和相关文档在语义空间上建立映射的方法经常失效.

提出一种新的网络结构

We strive to develop a series of new latent semantic models with a deep structure...

我们提出一系列模型新的深沉隐性语义模型

Our work is the first to apply a bidirectional LSTM CRF (denoted as BI-LSTM-CRF) model to NLP benchmark sequence tagging data sets.

我们的工作首先将双向LSTM模型用于命名实体模型上

We propose two novel model architectures for computing continuous vector representations of words from very large data sets, The quality of these representations

我们提出两种模型通过大数据来计算词向量

Here we propose node2vec, an algorithmic framework for learning continuous feature representations for nodes in networks. 

这里,我们提出node2vec,一种学习节点在图中表征的算法结构

点出方法核心

使用向量距离代表向量匹配程度

that project queries and documents into a common low-dimensional space where the relevance of a document given a query is readily computed as the distance between them.

将查询与文档映射到同一个低维向量空间,在这个空间内,查询与文档的相关性(相似性/匹配程度)可由两向量的空间距离计算得出.

The word hashing method described here aims to reduce tho dimensionality of the bag-of-words term vectors.

利用词哈希技术解决词表过大问题

 In node2vec, we learn a mapping of nodes to a low-dimensional space of features that maximizes the likelihood of preserving networkneighborhoods of nodes. 我们学习连续的低维的向量,通过最大似然来表示网络的节点

We define a flexible notion of a node' snetwork neighborhood 我们通过定义灵活的采样节点方法,产生随机序列

We view the problem of sampling neighborhoods of a source node as a form of local search.

我们将邻居采样问题视为局部搜索的一种形式

The method optimizes a carefully designed objective function that preserves both the local and global network structures.

这个方法精心优化了目标函数,该目标函数保存了网络的局部和全局结构

评估指标

The quality of these representations is measured in a word similarity task, and the results are compared to the previously best performing techniques based on different types of neural networks.

模型的评价指标是词相似度,结果相较于持平最好的神经网络

模型的优势

We observe large improvements in accuracy at much lower computational cost, i.e. it

我们发现准确率上巨大的改进在很低的计算量

Furthermore, we show that these vectors provide state-of-the-art performance on our test set for measuring syntactic and semantic word similarities.

我们在语义和语法上得到了最好的结果

模型优化策略

The proposed Deep Structured Semantic Models are discriminatively trained by maximizing the conditional likelihood of the clicked documents given a query using the clickthrough data.

模型的优化策略:给定输入时,最大化正确标签出现的条件概率(说人话就是最小化softmax交叉熵损失)

常见模型的描述

 A RNN maintains a memory based on history information, which enables the model to predict the current output conditioned on long distance features. RNN可以利用历史信息的结构,能解决长距离的依赖
Long Short Term Memory networks are the same as RNNs.except that the hidden layer updates are replaced by purpose-built memory cells. As a result, they may be better at finding and exploiting long range dependencies in the data。

LSTM与RNN类似,可以解决更长距离的依赖

 双向lstm

In doing so, we can efficiently make use of past features (via forward states) and future features (via backward states) for a specific time frame. 

双向lstm可以有效利用双向的数据

CRF

It has been shown that CRFs can produce higher tagging accuracy in general. CRF有更好的标记准确率

 双向LSTM-CRF

In addition to the past input features and sentence level tag information used in a LSTM-CRF model, a BI-LSTM-CRF model can use the future input features. The extra features can boost tagging accuracy as we will show in experiments.

 word embedding

It has been shown in (Collobert et al., 201I) that word embedding plays a vital role to improve se quence tagging performance. 
 

features connection tricks

表格命名

Comparison of (xx任务) for various models 

论文插图

1.画图网络的邻居汇聚的示意图

 

 2.t-sne可视化特征向量

 Visualization of the co-author network. The authors are mapped to the 2-D space using the t-SNE package witl learned embeddings as input. Color of a node indicates the community of the author. Red: "data Mining." blue: "machin learning " green: "combuter vision "

可视化xx网路 作者使用tsne将向量降维到2-D空间.节点的颜色代表作者的社区.红色是啥,蓝色是啥,绿色是啥

证明模型鲁棒性

(xx任务) with only (xx)

参数的不同选择是否会影响模型的性能

we examine how the different choices of parameters affect the performance of node2vec on the xx dataset using a 50-50 split between labeled and unlabeled data

我们通过实验验证了在各种网络的有效性

Empirical experiments prove the effectiveness of the Line on a variety of real-word information neetworks,including language networks,social networks, and citation networks.

与现有系统对比

comparison of F1 scores(xx指标) of different models for xx任务

控制变量

1.we generate an equal number of samples for each method and then evaluate the quality of the obtained features on the prediction task

我们生成等量的样本对于每种模型然后测试模型性能在预测任务上

描述实验结果

1.The best in-out and return hyperparameters were learned using 5-fold cross-validation on 10% labeled data with a grid search over p,q属于{0.25,0.50,1,2,4}

最好的模型结果以及参数通过五折交叉在10%的训练集用grid search的方法

 论文的架构

 1.introduction

        1.1 goals of the paper

        1.2 previous word

2.model archiectures

        2.1 feedforward neural net language model

        2.2 recurrent neural net languagee model

        2.3 parallel training of neural networks

3.new log-linear models

        3.1contionuous bag-of-wods model

        3.2continuous skip-gram model

4.results

        4.1 task description

        4.2maximization of accuracy

        4.3 comparison of model architctutes

        4.4 large scale parallel training of models

        4.5 microsoft reasearch sentence completion challenge

5.examples of the learned relationships

6.conclusion

参考资料

A Comprehensive Survey on Graph Neural Networks

Deep Learning on Graphs:A Survey

Graph Neural Networks:A Review of Methods and Applications

Introduction to Graph Neural Networks.
Zhiyuan Liu, Jie Zhou

line:large-scale information network embedding 大规模信息网络的特征表示

图神经网络的发展

2013 Spectral networks and locally connected networks on graphs 更多的是趋向于空域的研究
2013 Translating Embeddings for Modeling Multi-relational Data
2014 DeepWalk: Online Learning of Social Representations
2016 Semi-Supervised Classification with Graph Convolutional Networks

 reconstruction:构成邻接矩阵

inference:保证图的结构(focus on network structure only)

Guess you like

Origin blog.csdn.net/weixin_45955767/article/details/120392091