On deep learning, map the neural network (GNN) Rising

Since the depth of learning in terms of interpretability and reasoning can exist relatively large limitations, combined with the depth chart to calculate and map the neural network learning (GNNs) became one of the recent research in academia and industry is high heat of a new direction. Industry generally believe that, GNN just can make up two defect depth study mentioned above can not be solved. GNN past year has been successful in more and more application scenarios, but it still faces many challenges.

Ants gold dress at this year's data mining held a "neural network research and practical application map" as the theme of the symposium on research in the field of top-level will KDD 2019. InfoQ reporters fortunate to interview ants gold dress artificial intelligence researcher Song music, talk and GNN depth study and practice in the application of large-scale industrial scene, difficult challenges currently facing and the possible future direction of technological progress.
On deep learning, map the neural network (GNN) Rising

Song music teacher at KDD 2019

GNN has become "AI hot"
addition to the traditional method of deep learning, map neural network (GNN) in the last two years and is recognized as "AI upstart." Because strong expressive view of the structure of the research and analysis with machine learning / map of the depth learning more and more attention. The map neural network (GNN) due to better performance and interpretability, chart analysis method has become a widely used, many more people will see it as "deep learning of a new generation of technology." Over the past year, academia and industry have launched a GNN related frameworks and tools to further promote the vigorous development of this area.

GNN 提供了图表征学习(Graph representation learning)或图嵌入技术(Graph embedding)的框架,可以用于各种图数据上的监督,半监督及强化学习。GNN将图上的元素,如节点,连接或者子图表达成为一个向量,而不同元素所对应的向量之间的距离保存了它们在原图上的相似关系。这样将拓扑关系表达为特征空间中的向量的做法,本质上是一种基于拓扑信息的特征提取过程,其结果是沟通了传统的图分析和各种传统机器学习或数据挖掘方法,在推荐系统、知识图谱构建及推理等领域都有许多应用。比如说,可以通过引入了图卷积操作构造了一个适用于图数据的半监督学习框架,用于提取更精确的特征表达或直接进行分类操作,并可以结合图像分割、视频理解、交通预测等许多领域开始探索其应用价值。无论对于图分析还是深度学习,GNN 都是一个极有价值的的演化。

GNN 的出现解决了传统深度学习方法难以应用到非规则形态数据上的痛点,大大扩展了神经网络的应用空间,并在一些问题上改进了模型的可解释性。对于许多建立在非规则形态数据基础之上的业务场景,诸如推荐、消歧、反欺诈等,GNN 都有极大的应用潜力。以蚂蚁金服为例,GNN 已经广泛部署于普惠金融业务的推荐和风控中。

宋乐还列举了两个比较有趣的新应用:

一个是 GNN 在知识图谱上推理的应用。知识图谱是蚂蚁金服非常重要的研发方向之一,借助知识图谱可以把中国所有注册企业都联系起来,图谱里每个节点可能就是一个注册的商家,这个节点数量可能会达到几千万。这些商家之间可能有一些是供应商关系,有一些是同行竞争对手的关系,有一些可能是存在法律诉讼的关系。如果想根据这个图来做一些预测和推测,用于普惠金融业务的推荐和风控,就可以借助 GNN。

另一个是动态图的应用。本质上,所有金融交易问题都是动态的,谁在什么时间买了什么东西,都是有一个对应的时间戳的,随着新的交易发生、新的账号产生,整个图应该是在不断变化的。如何把时间和图的结构一起考虑进去做表征,这个也是比较有挑战性的前沿问题。目前蚂蚁金服正在贷款准入模型中尝试应用动态图。

除此之外,据宋乐保守估计,GNN 目前至少已在阿里巴巴数十个业务场景落地。不过这只是 GNN 发展乐观的一面。

从业界整体落地情况来看,GNN 仍然处于发展初期。从 2018 年 10 月,由 DeepMind、谷歌大脑、麻省理工等近 30 名学者联名在 ArXiv 上传的论文《Relational inductive biases, deep learning, and graph networks》将 GNN 相关工作推到一个新的高度以来,GNN 火热发展还未到一年,很多本质问题尚未突破。与工业级深度学习应用面临的问题类似,GNN 要真正做到在工业界大规模落地,在底层系统架构方面仍需要做大量工作。

如何大规模落地 GNN 仍面临挑战
在宋乐看来,目前 GNN 在工业界大规模落地面临的挑战主要在于大规模图网络的训练和线上更新预测两方面。未来互联网公司只要涉及 GNN 相关应用工作,几乎都逃不开大规模图网络。

首先,工业级业务场景,尤其是互联网公司的业务场景,图网络规模通常都很大,至少包含亿级,甚至是十亿级、百亿级的图节点和边。要计算这么大规模的图神经网络,通常一台机器是无法达到想要的效果的,这时就需要一个专门的分布式图计算平台。如果没有一个平台能够支撑 GNN 所需的海量计算,就很难把 GNN 做好。但目前就业界来说,GNN 平台的进展仍然比较慢。还没有哪个企业能够推出一个足够好的开源 GNN 平台,并且能自信地表示可以很好地支持亿级节点的图网络。

在对GNN 模型进行训练时,算法需要与分布式图存储平台进行高效交互,这也是非常有挑战性的一项工作。在模型训练时,算法需要不断随机查询节点、节点的邻居和邻居的邻居,取出数据放到内存中做深度学习模型的前向 Inference 和后向的回传,这在大规模图上其实是很难做好的。对于 GNN 平台来说,做深度学习以及和数据库打交道这两个环节常常是导致速度慢最大的瓶颈。在过去两年,蚂蚁金服在分布式图存储这个方向上做了很多努力,目前已经开发出了一个高效的分布式图存储平台,以及可以跟这个图存储平台比较高效地交互的图训练平台。从数据上看,原来需要几天时间的亿级图网络训练已经可以缩短到一个小时以内。

大规模图神经网络在线上的预测也是难点之一。GNN 的 Embedding 并非实时的,以金融交易场景为例,每次出现一笔新的交易,图网络就会多一条边,图就会发生变化,如果想做好实时预测,就需要用最新的边根据这个 GNN 的参数,算出它的表征来进行预测。但是通常在线上环境中,要在非常短的响应时间内构一个图,把 GNN 计算好非常困难,特别是在交易量很大的情况下,通常都存在一定的滞后。如何让GNN 能够在线上高效地直接做这个运算,这个挑战还没有完全解决,需要和底层的系统架构做一些合作。

Song Yue frankly, there is no problem even if GNN, map calculation itself is in a difficult industry. Because the map and images, text is different, the number of neighbors of each node in the graph may not be connected to the same node type is not the same, is not the same type of edge, to manufacture a lot of irregular operation, each computing node needs not the same degree. The computer is particularly suitable for regular expressions, but naturally not suitable for irregular operations, computational graph belongs to irregular operations, before the traditional graph algorithms have a wide variety of research, but they can not solve the problem, coupled with the GNN this study introduces a layer of depth, complexity leads to increased sharply, the difficulty is even greater. So, how to get the training results and predictions GNN in a very short period of time, there is a big challenge. If this issue can be resolved, the GNN training and forecasts are done fast enough, then the algorithm engineer at the time of modeling, you can quickly try the effect of GNN GNN and the effect of a variety of different network structures, further GNN's modification to improve results.

Whether in academia or industry, which currently are still a relatively cutting-edge issues, and this is one of the bottlenecks in the current GNN field. Although Google, Facebook and other big companies in the industry to promote the development of GNN platform, but there is not a really good view of large-scale distributed network computing mainstream open-source platform.

Guess you like

Origin blog.51cto.com/14164343/2429700