Real-time tracking of scientific research trends丨7.17 selected new papers, with ChatPaper summary

As a scientific researcher, you need to search and browse a large amount of academic literature every day to obtain the latest scientific and technological progress and research results. However, traditional retrieval and reading methods can no longer meet the needs of researchers.

ChatPaper, a document knowledge tool that integrates retrieval, reading, and knowledge question-and-answer. Help you quickly improve the efficiency of searching and reading papers, obtain the latest research trends in the field, and make scientific research work more easily.

Combined with the cutting-edge dynamic subscription function, select arXiv's popular new papers of the day to form a summary of papers, so that everyone can understand cutting-edge trends more quickly.

If you want to have an in-depth dialogue on a certain paper, you can directly copy the link of the paper to your browser or go directly to the ChatPaper page: https://www.aminer.cn/chat/g/

List of featured new papers for July 17, 2023:

1. NIFTY: Neural Object Interaction Fields for Guided Human Motion Synthesis paper details page

Link: https://www.aminer.cn/pub/64b4bd0d3fda6d7f0654fbb3/?f=cs

ChatPaper review: The paper addresses the problem of generating realistic 3D human motions that interact with objects in a scene. The key idea proposed by the authors is to create a neural interaction field on a specific object, which outputs the distance to the effective interaction manifold according to the input human pose. This interaction field guides the sampling of an object-conditioned human action diffusion model to encourage plausible touch and ability semantics. To support interaction with sparse data, the authors propose an automated synthetic data pipeline. To this end, they incorporate specific interaction-anchored poses extracted from limited motion-capture data in a pre-trained motion model with basic human motion priors. Using a guided diffusion model trained on generated synthetic data, they synthesize sitting and lifting actions of several subjects, outperforming other methods in terms of movement quality and successfully completed movements. The authors call their framework NIFTY: Trajectory Synthesis for Neural Interaction Fields.

2.DreamTeacher: Pretraining Image Backbones with Deep Generative Models paper details page

Link: https://www.aminer.cn/pub/64b4bd0d3fda6d7f0654fb9a/?f=cs

ChatPaper review: This work introduces DreamTeacher, a self-supervised feature representation learning framework, which leverages generative networks to pre-train downstream image backbones. The researchers propose to extract knowledge from a trained generative model into a standard image backbone for a specific perception task. They studied two knowledge distillation methods: 1) incorporating the features learned by the generative model into the target image backbone as an alternative to pre-training on large labeled datasets (such as ImageNet); 2) distilling the labels of the generative network into The logical layer of the target backbone. The researchers performed a detailed analysis of multiple generative models, dense prediction benchmarks, and various pre-training schemes. Experiments demonstrate that DreamTeacher significantly outperforms existing self-supervised representation learning methods in all aspects. Unsupervised ImageNet pre-training using DreamTeacher shows significant improvements over ImageNet classification pre-training on downstream datasets, demonstrating generative models, especially diffusion generative models, as promising approaches for representation learning on large-scale and diverse datasets. method without the need for manual annotation.

3.Mega-TTS 2: Zero-Shot Text-to-Speech with Arbitrary Length Speech Prompts paper details page

Link: https://www.aminer.cn/pub/64b4bd093fda6d7f0654f518/?f=cs

ChatPaper review: Previous large-scale multi-speaker TTS models have successfully achieved this goal, but only within 10 seconds of recording. However, most models can only exploit limited information in short speech cues, which severely affects performance when doing fine identity imitation. This paper introduces Mega-TTS 2, a general zero-shot multi-speaker TTS model capable of synthesizing speech from unseen speakers using cues of arbitrary length. Specifically, the authors design a multi-reference timbre encoder to extract timbre information from multiple reference speeches; and train a prosodic language model that can handle speech cues of arbitrary length. With these designs, the authors' model adapts to prompts of different lengths, extending the speech quality upper limit of zero-shot text-to-speech. Furthermore, the authors introduce arbitrary source cues, exploiting probabilities derived from multiple P-LLM outputs during generation to generate expressive and controllable rhythms. Furthermore, the authors propose a phoneme-level autoregressive duration model to introduce contextual learning capabilities into duration modeling. Experimental results demonstrate that this approach not only enables the synthesis of identity-preserving speech from unseen speakers for short cues, but also achieves improved performance when using longer cues.

4. Learning to Retrieve In-Context Examples for Large Language Models paper details page

Link: https://www.aminer.cn/pub/64b4bd093fda6d7f0654f4de/?f=cs

ChatPaper review: The paper shows that the effectiveness of contextual learning in large language models relies on the quality of the selected examples. However, how to select high-quality context examples is a challenge. This paper proposes a novel framework to address this problem by iteratively training a dense retriever. The framework first trains a reward model based on feedback from a language model to evaluate the quality of candidate examples, and then uses knowledge distillation to train a dual-encoder-based dense retriever. Experiments demonstrate that the framework significantly improves the performance of context learning and demonstrates generalization to unseen tasks during training. In-depth analysis shows that the model improves performance by retrieving examples with similar patterns, and this improvement is consistent across language models of different sizes.

5. Copy Is All You Need paper details page

Link: https://www.aminer.cn/pub/63dcdb422c26941cf00b6339/?f=cs

ChatPaper review: The paper points out that traditional text generation models generate output by selecting words from a fixed vocabulary, while it proposes a new approach that views text generation as gradually copying text fragments from existing text collections (e.g. word or phrase). The method decomposes the task of text generation into a series of copy-and-paste operations by computing contextualized representations of meaningful text fragments and indexing them using efficient vector search tools: at each time step, we extract Look for appropriate text fragments instead of choosing from independent vocabularies. Experimental results show that the method achieves better generation quality by replicating from the original training data and is validated (0.758 vs. 0.691 MAUVE) on a standard language modeling benchmark (WikiText-103). Furthermore, the method shows that additional performance gains can be obtained by increasing the size of the text collection without additional training. Moreover, the method can also achieve effective domain adaptation by simply switching to any domain-specific text collection, also without further training. Finally, the method improves inference efficiency by reducing decoding steps, and achieves better inference efficiency than traditional marker-level autoregressive models.

6. DIALGEN: Collaborative Human-LM Generated Dialogues for Improved Understanding of Human-Human Conversations paper details page

Link: https://www.aminer.cn/pub/64b4bd093fda6d7f0654f463/?f=cs

ChatPaper review: The paper proposes a method called DIALGEN to solve the challenge of automatically understanding human-to-human conversations. These challenges involve private information in real-world data, such as data in call centers or clinical conversations. Using protected data also increases the cost of annotation, limiting the development of the technology. To address these challenges, the authors propose DIALGEN, a human-in-the-loop semi-automatic dialogue generation framework. DIALGEN uses a language model (ChatGPT) to generate fluent dialogue text by iteratively generating subdialogues and using human feedback to correct inconsistencies or redirect dialogue flow. Through experiments on structured summary agent-client information collection calls for dialogue state tracking, we show that DIALGEN data can significantly improve model performance.

7. Exploiting Counter-Examples for Active Learning with Partial labels paper details page

Link: https://www.aminer.cn/pub/64b4bd093fda6d7f0654f5e7/?f=cs

ChatPaper review: This paper studies a new problem, Active Learning with Partial Labels (ALPL). In this setting, an oracle annotates query samples with partial labels, relaxing the oracle's requirement for an accurate labeling process. To address the ALPL problem, we first establish an intuitive baseline that can be seamlessly incorporated into existing AL frameworks. Although effective, this baseline is still prone to overfitting and lacks representative partial label-based samples during query. Inspired by human reasoning in cognitive science, where accurate inferences can be explicitly derived from counterexamples (CEs), we aim to exploit this human-like learning model to address overfitting while enhancing selection in ALPL. Process for a representative sample. Specifically, we construct CEs by inverting the partial labels of each instance, and then we propose a simple but effective WorseNet to learn directly from this complementary pattern. By exploiting the distribution differences between WorseNet and the predictor, this adversarial evaluation modality can enhance the performance of the predictor itself and the sample selection process, enabling the predictor to capture more accurate patterns in the data. Experiments on five real datasets and four benchmark datasets demonstrate the overall improvement of our proposed method over ten representative AL frameworks, highlighting the superiority of WorseNet.

8.Generating Efficient Training Data via LLM-based Attribute Manipulation paper details page

Link: https://www.aminer.cn/pub/64b4bd093fda6d7f0654f49a/?f=cs

ChatPaper review: The paper proposes a novel method - Chain-of-Thoughts Attribute Manipulation (CoTAM), which guides few-shot learning by carefully designing data from large language models (LLMs). The main idea is to create data that only changes in the task target properties. Inspired by facial attribute manipulation, our method utilizes LLMs to manipulate task-specific attributes and reconstruct new sentences in a controllable manner, resulting in label switching data. Different from traditional latent representation control, we adopt the decomposition and reconstruction method of thinking chain to adapt the process of LLMs. Extensive experiments on text classification and other tasks validate the superiority of CoTAM over other LLMs-based text generation methods with the same number of training samples. The analysis visualizes the attribute manipulation effect of CoTAM and demonstrates the potential of LLM-guided learning even with less supervision.


How to use ChatPaper?

The method of using ChatPaper is very simple. Open the AMiner homepage and enter the ChatPaper page from the navigation bar at the top of the page or the lower right corner.
insert image description here

On the ChatPaper page, you can choose to have a dialogue based on a single document or a dialogue based on the entire library (personal library), and you can choose to upload a local PDF or directly search for documents on AMiner.

Guess you like

Origin blog.csdn.net/AI_Conf/article/details/131781869