人工智能 | ShowMeAI资讯日报 #2022.06.09

持续创作,加速成长!这是我参与「掘金日新计划 · 6 月更文挑战」的第11天,点击查看活动详情

ShowMeAI日报系列全新升级!覆盖AI人工智能 工具&框架 | 项目&代码 | 博文&分享 | 数据&资源 | 研究&论文 等方向。点击查看 历史文章列表,在公众号内订阅话题 #ShowMeAI资讯日报,可接收每日最新推送。点击 专题合辑&电子月刊 快速浏览各专题全集。

1.工具&框架

工具系统:Energon-AI - 大模型(bert,gpt2,ViT等)推理应用系统

tags: [bert,gpt2,ViT]

'Energon-AI - Large-scale model inference.' by HPC-AI Tech

GitHub: github.com/hpcaitech/E…

工具系统:Eurybia - 模型监测系统

tags: [模型检测,数据偏移]

'Eurybia - monitor model drift over time and securize model deployment with data validation' by MAIF

GitHub: github.com/MAIF/eurybi…

工具:book_writer - AI写作助手

tags: [AI写作,写作,文本生成]

'book_writer' by Sung Kim

GitHub: github.com/hunkim/book…

工具库:giotto-ai - 基于Scikit-Learn的(Python)高性能拓扑机器学习工具箱

tags: [拓扑,机器学习]

'giotto-ai - a high-performance topological machine learning toolbox in Python built on top of scikit-learn'

GitHub: github.com/giotto-ai/g…

paper:《giotto-tda: A Topological Data Analysis Toolkit for Machine Learning and Data Exploration》

工具库:DeepVision3D - 点云理解开源工具箱

tags: [点云,数据理解]

'DV-Lab 3D Toolbox (DeepVision3D) - an open source toolbox for point-cloud understanding.' by DV Lab

GitHub: github.com/dvlab-resea…

工具库:Graphic - 数据可视化图表工具库

tags: [可视化,数据呈现]

'Graphic - A grammar of data visualization and Flutter charting library.' by LIN Chen

GitHub: github.com/entronad/gr…

工具库:MedicalSeg - 简单易使用的全流程 3D 医学图像分割工具包

tags: [生物医疗,图像分割]

工具库支持从数据预处理、训练评估、再到模型部署的全套分割流程

'MedicalSeg - MedicalSeg is an easy-to-use 3D medical image segmentation toolkit that supports the whole segmentation process' by PaddleCV-SIG

GitHub: github.com/PaddleCV-SI…

工具库:S3PRL - 自监督语音预训练和表征学习工具包

tags: [自监督,语音训练,表征学习]

'S3PRL - Self-Supervised Speech Pre-training and Representation Learning Toolkit.'

GitHub: github.com/s3prl/s3prl

2.博文&分享

教程:深入浅出区块链

tags: [区块链]

《Blockchain in a nutshell》D A. Tran, B Krishnamachari (2022)

Link: arxiv.org/abs/2205.01…

教程:前端啃算法

tags: [算法,数据结构,LeetCode]

一次性解决前端工程师的算法学习问题' by course-dasheng

GitHub: github.com/course-dash…

教程:Go语言并发开发指南

tags: [Go,开发]

'Go Concurrency Guide - Practical concurrency guide in Go, communication by channels, patterns' by Lucas Alves

GitHub: github.com/luk4z7/go-c…

教程:化学信息学实战教程

tags: [化学信息学,实战,教程]

'Practical Cheminformatics With Open Source Software' by Patrick Walters

GitHub: github.com/PatWalters/…

3.数据&资源

数据:FairytaleQA - 取材自故事书的叙事理解问答数据集

tags: [问答,故事,叙事理解,数据集]

'FairytaleQA: A Dataset for Question and Answer Generation - A dataset of over 10000 question and answer pairs written for storybooks.' by UC Irvine, School of Education

GitHub: github.com/uci-soe/Fai…

资源列表:路线图大列表,包括AI/机器学习/数据科学主题

tags: [学习路径,知识地图,AI,资源大全]

'Awesome Roadmaps - A curated list of roadmaps.' by liuchong

GitHub: github.com/liuchong/aw…

资源列表:图预训练文献列表

tags: [预训练,图模型,资源大全]

'A Survey of Pretraining on Graphs: Taxonomy, Methods, and Applications - A curated list of resources for pre-training on graphs.' by Jun Xia

GitHub: github.com/junxia97/aw…

资源列表:Ultimate-Awesome-Transformer-Attention:Transformer与注意力文献资源列表

tags: [Transformer,注意力,资源大全]

'Ultimate-Awesome-Transformer-Attention - An ultimately comprehensive paper list of Vision Transformer/Attention, including papers, codes, and related websites' by Min-Hung (Steve) Chen

GitHub: github.com/cmhungsteve…

资源:开发人员的文档站点开发指南

tags: [站点开发,指南]

'How to Make a Docs Site: Shortcuts for Busy Devs - How to Make a Docs Site: Shortcuts for Busy Devs' by jablonskidev

GitHub: github.com/jablonskide…

4.研究&论文

公众号回复关键字 日报,免费获取整理好的6月论文合辑。

论文:GroupNet: Multiscale Hypergraph Neural Networks for Trajectory Prediction with Relational Reasoning

论文标题:GroupNet: Multiscale Hypergraph Neural Networks for Trajectory Prediction with Relational Reasoning

论文时间:19 Apr 2022

所属领域:Natural Language Processing / 自然语言处理

对应任务:Relational Reasoning,Representation Learning,Trajectory Prediction,关系推理,表征学习,轨迹预测

论文地址arxiv.org/abs/2204.08…

代码实现github.com/mediabrain-…

论文作者:Chenxin Xu, Maosen Li, Zhenyang Ni, Ya zhang, Siheng Chen

论文简介:From the aspect of interaction capturing, we propose a trainable multiscale hypergraph to capture both pair-wise and group-wise interactions at multiple group sizes. / 从交互捕获的角度来看,我们提出了一个可训练的多尺度超图来捕获多个组大小的成对和分组交互。

论文摘要:Demystifying the interactions among multiple agents from their past trajectories is fundamental to precise and interpretable trajectory prediction. However, previous works only consider pair-wise interactions with limited relational reasoning. To promote more comprehensive interaction modeling for relational reasoning, we propose GroupNet, a multiscale hypergraph neural network, which is novel in terms of both interaction capturing and representation learning. From the aspect of interaction capturing, we propose a trainable multiscale hypergraph to capture both pair-wise and group-wise interactions at multiple group sizes. From the aspect of interaction representation learning, we propose a three-element format that can be learnt end-to-end and explicitly reason some relational factors including the interaction strength and category. We apply GroupNet into both CVAE-based prediction system and previous state-of-the-art prediction systems for predicting socially plausible trajectories with relational reasoning. To validate the ability of relational reasoning, we experiment with synthetic physics simulations to reflect the ability to capture group behaviors, reason interaction strength and interaction category. To validate the effectiveness of prediction, we conduct extensive experiments on three real-world trajectory prediction datasets, including NBA, SDD and ETH-UCY; and we show that with GroupNet, the CVAE-based prediction system outperforms state-of-the-art methods. We also show that adding GroupNet will further improve the performance of previous state-of-the-art prediction systems.

从过去的轨迹中揭开多个智能体之间的相互作用的神秘面纱是精确和可解释的轨迹预测的基础。然而,以前的工作只考虑具有有限关系推理的成对交互。为了促进更全面的关系推理交互建模,我们提出了GroupNet,一种多尺度超图神经网络,它在交互捕获和表示学习方面都是新颖的。从交互捕获的角度来看,我们提出了一个可训练的多尺度超图来捕获多个组大小的成对和分组交互。从交互表示学习的角度来看,我们提出了一种可以端到端学习的三元素格式,并明确地推理一些关系因素,包括交互强度和类别。我们将 GroupNet 应用到基于 CVAE 的预测系统和以前最先进的预测系统中,以通过关系推理来预测社会上似是而非的轨迹。为了验证关系推理的能力,我们用合成物理模拟进行实验,以反映捕捉群体行为、推理交互强度和交互类别的能力。为了验证预测的有效性,我们对三个真实世界的轨迹预测数据集进行了广泛的实验,包括 NBA、SDD 和 ETH-UCY;我们展示了使用 GroupNet,基于 CVAE 的预测系统优于最先进的方法。我们还表明,添加 GroupNet 将进一步提高以前最先进的预测系统的性能。

论文:Blueprint Separable Residual Network for Efficient Image Super-Resolution

论文标题:Blueprint Separable Residual Network for Efficient Image Super-Resolution

论文时间:12 May 2022

所属领域:Computer Vision / 计算机视觉

对应任务:Image Super-Resolution,Super-Resolution,图像超分辨率,超分辨率

论文地址arxiv.org/abs/2205.05…

代码实现github.com/xiaom233/bs…

论文作者:Zheyuan Li, Yingqi Liu, Xiangyu Chen, Haoming Cai, Jinjin Gu, Yu Qiao, Chao Dong

论文简介:One is the usage of blueprint separable convolution (BSConv), which takes place of the redundant convolution operation. / 一个是使用蓝图可分离卷积(BSConv),它取代了冗余卷积操作。

论文摘要:Recent advances in single image super-resolution (SISR) have achieved extraordinary performance, but the computational cost is too heavy to apply in edge devices. To alleviate this problem, many novel and effective solutions have been proposed. Convolutional neural network (CNN) with the attention mechanism has attracted increasing attention due to its efficiency and effectiveness. However, there is still redundancy in the convolution operation. In this paper, we propose Blueprint Separable Residual Network (BSRN) containing two efficient designs. One is the usage of blueprint separable convolution (BSConv), which takes place of the redundant convolution operation. The other is to enhance the model ability by introducing more effective attention modules. The experimental results show that BSRN achieves state-of-the-art performance among existing efficient SR methods. Moreover, a smaller variant of our model BSRN-S won the first place in model complexity track of NTIRE 2022 Efficient SR Challenge. The code is available at github.com/xiaom233/BS…

单幅图像超分辨率(SISR)的最新进展取得了非凡的性能,但计算成本太高而无法应用于边缘设备。为了缓解这个问题,已经提出了许多新颖且有效的解决方案。具有注意机制的卷积神经网络(CNN)由于其效率和有效性而受到越来越多的关注。但是,卷积操作中仍然存在冗余。在本文中,我们提出了包含两种有效设计的蓝图可分离残差网络(BSRN)。一种是使用蓝图可分离卷积(BSConv),它代替了冗余卷积操作。另一种是通过引入更有效的注意力模块来增强模型能力。实验结果表明,BSRN 在现有的高效 SR 方法中实现了最先进的性能。此外,我们的模型 BSRN-S 的一个较小的变体在 NTIRE 2022 Efficient SR Challenge 的模型复杂度赛道中获得了第一名。代码在 github.com/xiaom233/BS…

论文:Divide to Adapt: Mitigating Confirmation Bias for Domain Adaptation of Black-Box Predictors

论文标题:Divide to Adapt: Mitigating Confirmation Bias for Domain Adaptation of Black-Box Predictors

论文时间:28 May 2022

所属领域:Methodology

对应任务:Domain Adaptation,Knowledge Distillation,领域适应,知识蒸馏

论文地址arxiv.org/abs/2205.14…

代码实现github.com/xyupeng/bet…

论文作者:Jianfei Yang, Xiangyu Peng, Kai Wang, Zheng Zhu, Jiashi Feng, Lihua Xie, Yang You

论文简介:Domain Adaptation of Black-box Predictors (DABP) aims to learn a model on an unlabeled target domain supervised by a black-box predictor trained on a source domain. / 黑盒预测器的域适应 (DABP) 旨在在未标记的目标域上学习模型,该模型由在源域上训练的黑盒预测器监督。

论文摘要:Domain Adaptation of Black-box Predictors (DABP) aims to learn a model on an unlabeled target domain supervised by a black-box predictor trained on a source domain. It does not require access to both the source-domain data and the predictor parameters, thus addressing the data privacy and portability issues of standard domain adaptation. Existing DABP approaches mostly rely on model distillation from the black-box predictor, \emph{i.e.}, training the model with its noisy target-domain predictions, which however inevitably introduces the confirmation bias accumulated from the prediction noises. To mitigate such bias, we propose a new method, named BETA, to incorporate knowledge distillation and noisy label learning into one coherent framework. This is enabled by a new divide-to-adapt strategy. BETA divides the target domain into an easy-to-adapt subdomain with less noise and a hard-to-adapt subdomain. Then it deploys mutually-teaching twin networks to filter the predictor errors for each other and improve them progressively, from the easy to hard subdomains. As such, BETA effectively purifies the noisy labels and reduces error accumulation. We theoretically show that the target error of BETA is minimized by decreasing the noise ratio of the subdomains. Extensive experiments demonstrate BETA outperforms existing methods on all DABP benchmarks, and is even comparable with the standard domain adaptation methods that use the source-domain data.

黑盒预测器的域适应(DABP)旨在学习一个未标记目标域上的模型,该模型由在源域上训练的黑盒预测器监督。它不需要同时访问源域数据和预测器参数,从而解决了标准域自适应的数据隐私和可移植性问题。现有的 DABP 方法主要依赖于从黑盒预测器 \emph{i.e.} 中提取模型,使用其嘈杂的目标域预测来训练模型,但这不可避免地会引入从预测噪声中累积的确认偏差。为了减轻这种偏见,我们提出了一种名为 BETA 的新方法,将知识蒸馏和噪声标签学习结合到一个连贯的框架中。这是通过新的划分适应策略实现的。 BETA将目标域划分为噪声较小的易适应子域和难以适应的子域。然后,它部署互教双网络来过滤彼此的预测器错误,并从易到难的子域逐步改进它们。因此,BETA 有效地净化了嘈杂的标签并减少了错误累积。我们从理论上表明,通过降低子域的噪声比,可以最小化 BETA 的目标误差。大量实验表明,BETA 在所有 DABP 基准测试中都优于现有方法,甚至可以与使用源域数据的标准域适应方法相媲美。

论文:Resolution-robust Large Mask Inpainting with Fourier Convolutions

论文标题:Resolution-robust Large Mask Inpainting with Fourier Convolutions

论文时间:15 Sep 2021

所属领域:Computer Vision / 计算机视觉

对应任务:Image Inpainting,图像修复

论文地址arxiv.org/abs/2109.07…

代码实现github.com/saic-mdal/l… , github.com/Moldoteck/l… , github.com/rawmean/lam…

论文作者:Roman Suvorov, Elizaveta Logacheva, Anton Mashikhin, Anastasia Remizova, Arsenii Ashukha, Aleksei Silvestrov, Naejin Kong, Harshith Goka, Kiwoong Park, Victor Lempitsky

论文简介:We find that one of the main reasons for that is the lack of an effective receptive field in both the inpainting network and the loss function. / 我们发现其中一个主要原因是修复网络和损失函数都缺乏有效的感受野。

论文摘要:Modern image inpainting systems, despite the significant progress, often struggle with large missing areas, complex geometric structures, and high-resolution images. We find that one of the main reasons for that is the lack of an effective receptive field in both the inpainting network and the loss function. To alleviate this issue, we propose a new method called large mask inpainting (LaMa). LaMa is based on i) a new inpainting network architecture that uses fast Fourier convolutions (FFCs), which have the image-wide receptive field; ii) a high receptive field perceptual loss; iii) large training masks, which unlocks the potential of the first two components. Our inpainting network improves the state-of-the-art across a range of datasets and achieves excellent performance even in challenging scenarios, e.g. completion of periodic structures. Our model generalizes surprisingly well to resolutions that are higher than those seen at train time, and achieves this at lower parameter&time costs than the competitive baselines. The code is available at github.com/saic-mdal/l…

现代图像修复系统尽管取得了重大进展,但仍经常与大的缺失区域、复杂的几何结构和高分辨率图像作斗争。我们发现,造成这种情况的主要原因之一是修复网络和损失函数都缺乏有效的感受野。为了缓解这个问题,我们提出了一种称为大掩码修复 (LaMa) 的新方法。 LaMa 基于 i) 一种新的修复网络架构,该架构使用快速傅里叶卷积 (FFC),具有图像范围的感受野; ii) 高感受野感知损失; iii) 大型训练掩码,释放前两个组件的潜力。我们的修复网络改进了一系列数据集的最新技术,即使在具有挑战性的场景中也能实现出色的性能,例如完成周期性结构。我们的模型可以很好地概括出比训练时更高的分辨率,并以比竞争基线更低的参数和时间成本实现这一目标。代码位于github.com/saic-mdal/l…

论文:Uniform Masking: Enabling MAE Pre-training for Pyramid-based Vision Transformers with Locality

论文标题:Uniform Masking: Enabling MAE Pre-training for Pyramid-based Vision Transformers with Locality

论文时间:20 May 2022

所属领域:Computer Vision / 计算机视觉

对应任务:Object Detection,物体检测

论文地址arxiv.org/abs/2205.10…

代码实现github.com/implus/um-m…

论文作者:Xiang Li, Wenhai Wang, Lingfeng Yang, Jian Yang

论文简介:Masked AutoEncoder (MAE) has recently led the trends of visual self-supervision area by an elegant asymmetric encoder-decoder design, which significantly optimizes both the pre-training efficiency and fine-tuning accuracy. / Masked AutoEncoder (MAE) 最近通过优雅的非对称编解码器设计引领了视觉自监督领域的趋势,显着优化了预训练效率和微调精度。

论文摘要:Masked AutoEncoder (MAE) has recently led the trends of visual self-supervision area by an elegant asymmetric encoder-decoder design, which significantly optimizes both the pre-training efficiency and fine-tuning accuracy. Notably, the success of the asymmetric structure relies on the "global" property of Vanilla Vision Transformer (ViT), whose self-attention mechanism reasons over arbitrary subset of discrete image patches. However, it is still unclear how the advanced Pyramid-based ViTs (e.g., PVT, Swin) can be adopted in MAE pre-training as they commonly introduce operators within "local" windows, making it difficult to handle the random sequence of partial vision tokens. In this paper, we propose Uniform Masking (UM), successfully enabling MAE pre-training for Pyramid-based ViTs with locality (termed "UM-MAE" for short). Specifically, UM includes a Uniform Sampling (US) that strictly samples 1 random patch from each 2×2 grid, and a Secondary Masking (SM) which randomly masks a portion of (usually 25%) the already sampled regions as learnable tokens. US preserves equivalent elements across multiple non-overlapped local windows, resulting in the smooth support for popular Pyramid-based ViTs; whilst SM is designed for better transferable visual representations since US reduces the difficulty of pixel recovery pre-task that hinders the semantic learning. We demonstrate that UM-MAE significantly improves the pre-training efficiency (e.g., it speeds up and reduces the GPU memory by ∼2×) of Pyramid-based ViTs, but maintains the competitive fine-tuning performance across downstream tasks. For example using HTC++ detector, the pre-trained Swin-Large backbone self-supervised under UM-MAE only in ImageNet-1K can even outperform the one supervised in ImageNet-22K. The codes are available at github.com/implus/UM-M…

Masked AutoEncoder (MAE) 最近通过优雅的非对称编解码器设计引领了视觉自监督领域的发展趋势,显着优化了预训练效率和微调精度。值得注意的是,非对称结构的成功依赖于 Vanilla Vision Transformer (ViT) 的“全局”属性,其自注意力机制可以解释离散图像块的任意子集。然而,目前还不清楚如何在 MAE 预训练中采用先进的基于 Pyramid 的 ViT(例如 PVT、Swin),因为它们通常在“本地”窗口内引入算子,从而难以处理部分视觉的随机序列令牌。在本文中,我们提出了统一掩蔽(UM),成功地为基于 Pyramid 的具有局部性的 ViT 进行 MAE 预训练(简称为“UM-MAE”)。具体来说,UM 包括一个统一采样(US),它从每个 2×2 网格中严格采样 1 个随机补丁,以及一个二次掩蔽(SM),它随机掩蔽一部分(通常是 25%)已经采样的区域作为可学习的标记。 US 在多个非重叠本地窗口中保留等效元素,从而顺利支持流行的基于 Pyramid 的 ViT;而 SM 是为更好的可转移视觉表示而设计的,因为 US 降低了阻碍语义学习的像素恢复预任务的难度。我们证明 UM-MAE 显着提高了基于 Pyramid 的 ViT 的预训练效率(例如,它加速并将 GPU 内存减少了 ~2 倍),但在下游任务中保持了具有竞争力的微调性能。例如,使用 HTC++ 检测器,仅在 ImageNet-1K 中在 UM-MAE 下自我监督的预训练 Swin-Large 主干甚至可以胜过在 ImageNet-22K 中监督的主干。代码可在 github.com/implus/UM-M… 获得。

论文:Machine Learning Estimation of Heterogeneous Treatment Effects with Instruments

论文标题:Machine Learning Estimation of Heterogeneous Treatment Effects with Instruments

论文时间:NeurIPS 2019

论文地址arxiv.org/abs/1905.10…

代码实现github.com/Microsoft/E…

论文作者:Vasilis Syrgkanis, Victor Lei, Miruna Oprescu, Maggie Hei, Keith Battocchi, Greg Lewis

论文简介:We develop a statistical learning approach to the estimation of heterogeneous effects, reducing the problem to the minimization of an appropriate loss function that depends on a set of auxiliary models (each corresponding to a separate prediction task). / 我们开发了一种统计学习方法来估计异质效应,将问题简化为最小化适当的损失函数,该损失函数依赖于一组辅助模型(每个模型对应一个单独的预测任务)。

论文摘要:We consider the estimation of heterogeneous treatment effects with arbitrary machine learning methods in the presence of unobserved confounders with the aid of a valid instrument. Such settings arise in A/B tests with an intent-to-treat structure, where the experimenter randomizes over which user will receive a recommendation to take an action, and we are interested in the effect of the downstream action. We develop a statistical learning approach to the estimation of heterogeneous effects, reducing the problem to the minimization of an appropriate loss function that depends on a set of auxiliary models (each corresponding to a separate prediction task). The reduction enables the use of all recent algorithmic advances (e.g. neural nets, forests). We show that the estimated effect model is robust to estimation errors in the auxiliary models, by showing that the loss satisfies a Neyman orthogonality criterion. Our approach can be used to estimate projections of the true effect model on simpler hypothesis spaces. When these spaces are parametric, then the parameter estimates are asymptotically normal, which enables construction of confidence sets. We applied our method to estimate the effect of membership on downstream webpage engagement on TripAdvisor, using as an instrument an intent-to-treat A/B test among 4 million TripAdvisor users, where some users received an easier membership sign-up process. We also validate our method on synthetic data and on public datasets for the effects of schooling on income.

我们在有效仪器的帮助下,在存在未观察到的混杂因素的情况下,考虑使用任意机器学习方法估计异质治疗效果。此类设置出现在具有意向处理结构的 A/B 测试中,实验者随机选择哪个用户将收到采取行动的建议,我们对下游行动的影响感兴趣。我们开发了一种统计学习方法来估计异质效应,将问题减少到最小化依赖于一组辅助模型(每个对应于一个单独的预测任务)的适当损失函数。这种减少可以使用所有最近的算法进步(例如神经网络、森林)。我们通过证明损失满足 Neyman 正交性标准,表明估计的效果模型对辅助模型中的估计误差具有鲁棒性。我们的方法可用于估计真实效应模型在更简单的假设空间上的投影。当这些空间是参数时,参数估计是渐近正态的,这使得能够构建置信集。我们应用我们的方法来估计会员资格对 TripAdvisor 上下游网页参与度的影响,使用在 400 万 TripAdvisor 用户中进行的意向性 A/B 测试作为工具,其中一些用户获得了更简单的会员注册流程。我们还在合成数据和公共数据集上验证了我们的方法,以了解学校教育对收入的影响。

论文:Tranception: protein fitness prediction with autoregressive transformers and inference-time retrieval

论文标题:Tranception: protein fitness prediction with autoregressive transformers and inference-time retrieval

论文时间:27 May 2022

论文地址arxiv.org/abs/2205.13…

代码实现github.com/oatml-marks…

论文作者:Pascal Notin, Mafalda Dias, Jonathan Frazer, Javier Marchena-Hurtado, Aidan Gomez, Debora S. Marks, Yarin Gal

论文简介:The ability to accurately model the fitness landscape of protein sequences is critical to a wide range of applications, from quantifying the effects of human variants on disease likelihood, to predicting immune-escape mutations in viruses and designing novel biotherapeutic proteins. / 准确模拟蛋白质序列的适应性景观的能力对于广泛的应用至关重要,从量化人类变异对疾病可能性的影响,到预测病毒的免疫逃逸突变和设计新的生物治疗性蛋白质。

论文摘要:The ability to accurately model the fitness landscape of protein sequences is critical to a wide range of applications, from quantifying the effects of human variants on disease likelihood, to predicting immune-escape mutations in viruses and designing novel biotherapeutic proteins. Deep generative models of protein sequences trained on multiple sequence alignments have been the most successful approaches so far to address these tasks. The performance of these methods is however contingent on the availability of sufficiently deep and diverse alignments for reliable training. Their potential scope is thus limited by the fact many protein families are hard, if not impossible, to align. Large language models trained on massive quantities of non-aligned protein sequences from diverse families address these problems and show potential to eventually bridge the performance gap. We introduce Tranception, a novel transformer architecture leveraging autoregressive predictions and retrieval of homologous sequences at inference to achieve state-of-the-art fitness prediction performance. Given its markedly higher performance on multiple mutants, robustness to shallow alignments and ability to score indels, our approach offers significant gain of scope over existing approaches. To enable more rigorous model testing across a broader range of protein families, we develop ProteinGym -- an extensive set of multiplexed assays of variant effects, substantially increasing both the number and diversity of assays compared to existing benchmarks.

从量化人类变异对疾病可能性的影响,到预测病毒的免疫逃逸突变和设计新的生物治疗蛋白,准确模拟蛋白质序列的适应度景观的能力对于广泛的应用至关重要。迄今为止,在多序列比对上训练的蛋白质序列的深度生成模型是解决这些任务的最成功的方法。然而,这些方法的性能取决于是否有足够深度和多样化的对齐来进行可靠的训练。因此,它们的潜在范围受到许多蛋白质家族很难(即使不是不可能)对齐的事实的限制。对来自不同家族的大量未对齐蛋白质序列进行训练的大型语言模型解决了这些问题,并显示出最终弥合性能差距的潜力。我们介绍了 Tranception,一种新颖的变压器架构,利用自回归预测和推理时的同源序列检索来实现最先进的适应度预测性能。鉴于其在多个突变体上的显着更高性能、对浅比对的鲁棒性和对插入缺失进行评分的能力,我们的方法比现有方法提供了显着的范围增益。为了在更广泛的蛋白质家族中进行更严格的模型测试,我们开发了 ProteinGym——一组广泛的变异效应多重分析,与现有基准相比,大大增加了分析的数量和多样性。

论文:GPL: Generative Pseudo Labeling for Unsupervised Domain Adaptation of Dense Retrieval

论文标题:GPL: Generative Pseudo Labeling for Unsupervised Domain Adaptation of Dense Retrieval

论文时间:14 Dec 2021

所属领域:Methodology

对应任务:Domain Adaptation,Unsupervised Domain Adaptation,领域适应,无监督领域适应

论文地址arxiv.org/abs/2112.07…

代码实现github.com/ukplab/gpl

论文作者:Kexin Wang, Nandan Thakur, Nils Reimers, Iryna Gurevych

论文简介:This limits the usage of dense retrieval approaches to only a few domains with large training datasets. / 这将密集检索方法的使用限制在具有大型训练数据集的少数领域。

论文摘要:Dense retrieval approaches can overcome the lexical gap and lead to significantly improved search results. However, they require large amounts of training data which is not available for most domains. As shown in previous work (Thakur et al., 2021b), the performance of dense retrievers severely degrades under a domain shift. This limits the usage of dense retrieval approaches to only a few domains with large training datasets. In this paper, we propose the novel unsupervised domain adaptation method Generative Pseudo Labeling (GPL), which combines a query generator with pseudo labeling from a cross-encoder. On six representative domain-specialized datasets, we find the proposed GPL can outperform an out-of-the-box state-of-the-art dense retrieval approach by up to 9.3 points nDCG@10. GPL requires less (unlabeled) data from the target domain and is more robust in its training than previous methods. We further investigate the role of six recent pre-training methods in the scenario of domain adaptation for retrieval tasks, where only three could yield improved results. The best approach, TSDAE (Wang et al., 2021) can be combined with GPL, yielding another average improvement of 1.4 points nDCG@10 across the six tasks. The code and the models are available at github.com/UKPLab/gpl

密集检索方法可以克服词汇差距并显着改善搜索结果。但是,它们需要大量的训练数据,而这对于大多数领域来说是不可用的。如之前的工作(Thakur 等人,2021b)所示,密集检索器的性能在域转移下严重下降。这将密集检索方法的使用限制在具有大型训练数据集的少数领域。在本文中,我们提出了一种新颖的无监督域自适应方法生成伪标记(GPL),它将查询生成器与来自交叉编码器的伪标记相结合。在六个具有代表性的领域专业数据集上,我们发现所提出的 GPL 可以比开箱即用的最先进的密集检索方法高出 9.3 个点 nDCG@10。 GPL 需要来自目标域的更少(未标记)数据,并且在训练中比以前的方法更健壮。我们进一步研究了六种最近的预训练方法在检索任务的域适应场景中的作用,其中只有三种可以产生改进的结果。最好的方法是 TSDAE (Wang et al., 2021) 可以与 GPL 相结合,在六项任务中产生 1.4 分的 nDCG@10 平均改进。代码和模型可在 github.com/UKPLab/gpl 获得

论文:Unsupervised Dense Information Retrieval with Contrastive Learning

论文标题:Unsupervised Dense Information Retrieval with Contrastive Learning

论文时间:16 Dec 2021

所属领域:Computer Vision / 计算机视觉

对应任务:Contrastive Learning,Cross-Lingual Transfer,Fact Checking,Information Retrieval,Question Answering,对比学习,跨语言迁移,事实核查,信息检索,问

论文地址arxiv.org/abs/2112.09…

代码实现github.com/facebookres…

论文作者:Gautier Izacard, Mathilde Caron, Lucas Hosseini, Sebastian Riedel, Piotr Bojanowski, Armand Joulin, Edouard Grave

论文简介:In this work, we explore the limits of contrastive learning as a way to train unsupervised dense retrievers and show that it leads to strong performance in various retrieval settings. / 在这项工作中,我们探索了对比学习作为一种训练无监督密集检索器的方法的局限性,并表明它在各种检索设置中都具有强大的性能。

论文摘要:Recently, information retrieval has seen the emergence of dense retrievers, based on neural networks, as an alternative to classical sparse methods based on term-frequency. These models have obtained state-of-the-art results on datasets and tasks where large training sets are available. However, they do not transfer well to new applications with no training data, and are outperformed by unsupervised term-frequency methods such as BM25. In this work, we explore the limits of contrastive learning as a way to train unsupervised dense retrievers and show that it leads to strong performance in various retrieval settings. On the BEIR benchmark our unsupervised model outperforms BM25 on 11 out of 15 datasets for the Recall@100 metric. When used as pre-training before fine-tuning, either on a few thousands in-domain examples or on the large MS MARCO dataset, our contrastive model leads to improvements on the BEIR benchmark. Finally, we evaluate our approach for multi-lingual retrieval, where training data is even scarcer than for English, and show that our approach leads to strong unsupervised performance. Our model also exhibits strong cross-lingual transfer when fine-tuned on supervised English data only and evaluated on low resources language such as Swahili. We show that our unsupervised models can perform cross-lingual retrieval between different scripts, such as retrieving English documents from Arabic queries, which would not be possible with term matching methods.

最近,信息检索出现了基于神经网络的密集检索器,作为基于词频的经典稀疏方法的替代方案。这些模型在有大型训练集的数据集和任务上获得了最先进的结果。然而,它们不能很好地转移到没有训练数据的新应用程序中,并且优于 BM25 等无监督词频方法。在这项工作中,我们探索了对比学习作为一种训练无监督密集检索器的方法的局限性,并表明它在各种检索设置中都具有强大的性能。在 BEIR 基准测试中,我们的无监督模型在 Recall@100 指标的 15 个数据集中的 11 个数据集中优于 BM25。当在微调之前用作预训练时,无论是在数千个域内示例上还是在大型 MS MARCO 数据集上,我们的对比模型都会导致 BEIR 基准的改进。最后,我们评估了我们的多语言检索方法,其中训练数据甚至比英语更稀缺,并表明我们的方法导致了强大的无监督性能。当仅对受监督的英语数据进行微调并在斯瓦希里语等低资源语言上进行评估时,我们的模型还表现出强大的跨语言迁移。我们展示了我们的无监督模型可以在不同的脚本之间执行跨语言检索,例如从阿拉伯语查询中检索英语文档,而这在术语匹配方法中是不可能的。

论文:Focal Loss for Dense Object Detection

论文标题:Focal Loss for Dense Object Detection

论文时间:ICCV 2017

所属领域:Computer Vision / 计算机视觉

对应任务:Dense Object Detection,Long-tail Learning,Object Detection,Pedestrian Detection,Real-Time Object Detection,Region Proposal,密集目标检测,长尾学习,目标检测,行人检测,实时目标检测,区域提议

论文地址arxiv.org/abs/1708.02…

代码实现github.com/facebookres… , github.com/tensorflow/… , github.com/facebookres… , github.com/open-mmlab/… , github.com/AlexeyAB/da…

论文作者:Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, Piotr Dollár

论文简介:Our novel Focal Loss focuses training on a sparse set of hard examples and prevents the vast number of easy negatives from overwhelming the detector during training. / 我们新颖的 Focal Loss 将训练集中在一组稀疏的困难样本上,并防止大量简单的负样本在训练期间压倒检测器。

论文摘要:The highest accuracy object detectors to date are based on a two-stage approach popularized by R-CNN, where a classifier is applied to a sparse set of candidate object locations. In contrast, one-stage detectors that are applied over a regular, dense sampling of possible object locations have the potential to be faster and simpler, but have trailed the accuracy of two-stage detectors thus far. In this paper, we investigate why this is the case. We discover that the extreme foreground-background class imbalance encountered during training of dense detectors is the central cause. We propose to address this class imbalance by reshaping the standard cross entropy loss such that it down-weights the loss assigned to well-classified examples. Our novel Focal Loss focuses training on a sparse set of hard examples and prevents the vast number of easy negatives from overwhelming the detector during training. To evaluate the effectiveness of our loss, we design and train a simple dense detector we call RetinaNet. Our results show that when trained with the focal loss, RetinaNet is able to match the speed of previous one-stage detectors while surpassing the accuracy of all existing state-of-the-art two-stage detectors. Code is at: github.com/facebookres…

迄今为止,精度最高的目标检测器基于 R-CNN 推广的两阶段方法,其中分类器应用于一组稀疏的候选目标位置。相比之下,应用于可能对象位置的常规密集采样的单级检测器有可能更快、更简单,但迄今为止的准确性落后于两级检测器。在本文中,我们调查了为什么会出现这种情况。我们发现,在密集检测器训练过程中遇到的极端前景-背景类别不平衡是主要原因。我们建议通过重塑标准交叉熵损失来解决此类不平衡问题,从而降低分配给分类良好示例的损失的权重。我们新颖的 Focal Loss 将训练集中在一组稀疏的困难示例上,并防止大量简单的负样本在训练期间压倒检测器。为了评估损失的有效性,我们设计并训练了一个简单的密集检测器,我们称之为 RetinaNet。我们的结果表明,当使用焦点损失进行训练时,RetinaNet 能够与之前的一级检测器的速度相匹配,同时超过所有现有最先进的两级检测器的准确度。代码位于:github.com/facebookres…

我们是 ShowMeAI,致力于传播AI优质内容,分享行业解决方案,用知识加速每一次技术成长!点击查看 历史文章列表,在公众号内订阅话题 #ShowMeAI资讯日报,可接收每日最新推送。点击 专题合辑&电子月刊 快速浏览各专题全集。

猜你喜欢

转载自juejin.im/post/7107151585179861028