Real-time tracking of scientific research trends丨New papers selected on 9.19 from Microsoft, MetaAI, CMU University and other institutions

As a scientific researcher, you need to search and browse a large amount of academic literature every day to obtain the latest scientific and technological progress and research results.
However, traditional retrieval and reading methods can no longer meet the needs of scientific researchers.
AMiner AI is a literature knowledge tool that integrates retrieval, reading, and knowledge Q&A. Help you quickly improve the efficiency of retrieval and reading papers, obtain the latest research trends in the field, and make scientific research work more comfortable.
Insert image description here
If you want to have an in-depth conversation about a certain paper, you can directly copy the paper link to the browser or go directly to the AMiner AI page: https://www.aminer.cn/chat/g/explain

List of selected new papers on September 19, 2023:

1.An Empirical Study of Scaling Instruct-Tuned Large Multimodal Models

The paper addresses questions about scalability using large multimodal models (LMMs). Previous studies were conducted using models with parameter sizes of 13B or smaller, but this paper expands the LLaVA model to 33B and 65B/70B sizes, and improves image resolution, data mixing, and parameter efficient training methods (such as LoRA/ QLoRA) conducts empirical research and shares findings on multimodal and linguistic abilities in real-world tasks. The study found that increasing model size consistently improves model performance and improves language proficiency, while LoRA/QLoRA-tuned LMMs perform comparably to full-model fine-tuning. In addition, research also highlights the importance of increasing image resolution and mixing multi-modal language data to improve LMM performance, and sometimes visual guidance tuning can improve the pure language capabilities of LMM. It is hoped that this study will make larger, state-of-the-art LMM studies more accessible, thereby establishing a stronger baseline for future research. Code and checkpoints will be released publicly.

https://www.aminer.cn/pub/650905523fda6d7f06cd71ac/?f=cs

2.LayoutNUWA: Revealing the Hidden Layout Expertise of Large Language Models

The paper points out the problems existing in existing methods in layout generation, that is, layout generation is mainly regarded as a numerical optimization task and ignores the semantic information of the layout, such as the relationship between each layout element. To address this problem, the authors propose the LayoutNUWA model, which treats layout generation as a code generation task to enhance semantic information and leverage the hidden layout expertise of large language models. Through three interconnected modules (code initialization module, code completion module and code rendering module), the author proposes the Code Instruct Tuning (CIT) method to achieve a highly interpretable and transparent layout generation process that directly maps code to visualization layout. The method achieves significant state-of-the-art performance (even over 50% improvement) on multiple datasets.

https://www.aminer.cn/pub/650904f23fda6d7f06cd525e/?f=cs

3.Contrastive Decoding Improves Reasoning in Large Language Models

This paper introduces a text generation method called Contrastive Decoding, which has significant improvements over greedy decoding methods on various reasoning tasks. Contrastive Decoding improves the quality of long text generation by searching strings to maximize the difference in likelihood between strong and weak models. Research shows that Contrastive Decoding enables LLaMA-65B to outperform LLaMA 2, GPT-3.5, and PaLM 2-L on the HellaSwag common sense reasoning benchmark, and outperforms LLaMA 2, GPT-3.5, and PaLM-540B on the GSM8K math problem reasoning benchmark. , and there are also improvements in other tasks. Analysis shows that Contrastive Decoding improves existing methods by preventing some abstract reasoning errors and avoiding simple patterns such as copying parts of the input in the thought chain. In summary, Contrastive Decoding outperforms nuclei sampling and greedy decoding on long text generation and inference tasks, making it a powerful general method for generating text from language models.

https://www.aminer.cn/pub/650904db3fda6d7f06cd48d1/?f=cs

4.MindAgent: Emergent Gaming Interaction

The authors propose a novel infrastructure called MindAgent for evaluating emerging capabilities in planning and coordinating game interactions. This infrastructure leverages existing game frameworks with improvements that: i) require understanding from the coordinator of the multi-agent system; ii) cooperate with human players through untuned appropriate instructions; iii) operate on small amounts of input and feedback to build contextual learning. In addition, the authors also introduce a new game scenario called CUISINEWORLD and related benchmarks to evaluate multi-agent collaboration efficiency and supervise multiple agents playing games simultaneously. They conducted a comprehensive evaluation using a new automated metric called CoS. Finally, their infrastructure can be deployed into real-world gaming scenarios and adapted within the existing wider Minecraft gaming landscape. It is hoped that the results of the study of LLMs and new infrastructures for general scheduling and coordination will reveal ways to learn such skills from large corpora.

https://www.aminer.cn/pub/650904f23fda6d7f06cd5432/?f=cs

5.CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages

This article illustrates the common lack of transparency in training data for large language models (LLMs), especially the latest state-of-the-art models. This lack of transparency makes it difficult for researchers to understand and address issues of hallucination and bias present in LLMs, hindering replication efforts and further community development. These challenges become particularly salient in multilingual learning scenarios, as existing multilingual text datasets are often insufficiently collected and cleaned. Therefore, there is currently a lack of open source and ready-to-use datasets that can effectively train multilingual LLMs. To solve this problem, researchers launched CulturaX, a large multilingual dataset containing 63 trillion tokens in 167 languages, designed specifically for the development of LLM. The dataset undergoes rigorous multiple stages of meticulous cleaning and deduplication to ensure the best quality for model training. CulturaX has been fully released to the public and can be used by researchers in the study and advancement of multilingual LLM.

https://www.aminer.cn/pub/650904db3fda6d7f06cd49f3/?f=cs

6.A Distributed Data-Parallel PyTorch Implementation of the Distributed Shampoo Optimizer for Training Neural Networks At-Scale

The article mainly discusses an online stochastic optimization algorithm called Shampoo, which belongs to the AdaGrad series of methods. It builds a block diagonal preprocessor, where each block consists of the complete matrix AdaGrad of the rough Kronecker product approximation of each parameter of the neural network. The authors provide a complete description of the algorithm and introduce the performance optimizations they implemented in PyTorch for fast multi-GPU distributed data-parallel training on large-scale deep networks. Their implementation allocates the memory and computation associated with each parameter block via PyTorch's DTensor data structure and performs an AllGather primitive on the computed search direction on each iteration, which results in significant performance improvements. Compared to the standard diagonal scaling-based adaptive gradient method, the wall-clock time per step is only reduced by at most 10%. The authors validate their implementation through ablation studies on trained ImageNet ResNet50, demonstrating the superiority of Shampoo over standard training recipes with minimal hyperparameter tuning. This article mainly solves the performance problem of using Shampoo optimization algorithm in large-scale deep network training, and demonstrates the superiority of Shampoo over standard training methods.

https://www.aminer.cn/pub/65026d513fda6d7f06474a5c/?f=cs

7.Stack-and-Delay: a new codebook pattern for music generation

In language modeling-based music generation, a hierarchical sequence of token stacks is used to represent the generated waveforms, which can be autoregressive or parallel decoded based on codebook patterns. Flattening the codebook representation is the highest quality decoding strategy, but is very slow. To solve this problem, a novel stacked delay decoding strategy is proposed that can improve flat pattern decoding, making the generation speed four times faster than ordinary flat decoding. This brings inference times close to the delayed decoding strategy and allows faster inference on GPUs with small batch sizes. At the same inference efficiency budget as the delayed mode, we show that the proposed method performs better in objective evaluation, reaching almost the same quality level as the flat mode. Subjective evaluation results confirm that, given the same text prompt, samples generated using the new model are slightly more often preferred by people.

https://www.aminer.cn/pub/650904db3fda6d7f06cd4795/?f=cs

8.Adapting Large Language Models via Reading Comprehension

The article discusses how large language models adapt through reading comprehension, revealing how continuous pre-training on domain-specific corpora affects large language models. Research has found that training on the original corpus can equip the model with domain knowledge, but seriously impairs its ability to answer questions. Inspired by the improvement of human ability to answer questions through reading comprehension practice, the research proposes a simple method to convert the original corpus into reading comprehension text. Each original text is enriched with a series of tasks related to its content. Our approach works on any pre-trained corpus, is highly scalable, and consistently improves performance across a variety of tasks in three distinct domains: biomedicine, finance, and law. Notably, our 7B language model rivals larger-scale domain-specific models such as BloombergGPT-50B in competitive performance. Furthermore, we demonstrate that domain-specific reading comprehension text can even improve model performance on general benchmarks, showing the potential for developing general models in more domains. Articles provide accessibility links to models, code, and data.

https://www.aminer.cn/pub/650904f23fda6d7f06cd5276/?f=cs

9.Augmenting text for spoken language understanding with Large Language Models

The paper illustrates that in spoken language understanding, training powerful models requires expensive speech-transcription-semantic parsing data, and using unmatched text data to enhance text is a challenge. The paper addresses this challenge by comparing how speech representations are generated using unmatched text from existing text corpora and how unmatched text is generated using large language models. Experiments demonstrate that using unmatched text from existing and new domains can significantly improve performance, while using generated text for spoken semantic parsing can further improve performance.

https://www.aminer.cn/pub/650904db3fda6d7f06cd49e9/?f=cs

10.S3-DST: Structured Open-Domain Dialogue Segmentation and State Tracking in the Era of LLMs

In chat systems based on LLM (Large Language Model), the traditional dialogue state tracking (DST) problem faces many complexities in open-domain dialogue. These complexities include increased complexity of contextual interactions, expanded conversational sessions covering a variety of topics, and more frequent context switches. To handle these complexities caused by LLM-based chat systems, the authors propose a method for joint dialogue segmentation and state tracking in open-domain dialogue systems. The authors assume that the zero-shot setting is suitable for true open-domain dialogue systems, and propose S3-DST, a structured prompting technique that utilizes Pre-Analytical Recollection, a novel baseline mechanism designed by the authors to improve long context tracking. To demonstrate the effectiveness of our proposed joint segmentation and state tracking approach, we evaluate S3-DST on a proprietary anonymous open-domain conversation dataset and publicly available DST and segmentation datasets. Across all datasets and settings, S3-DST consistently outperforms existing techniques, demonstrating its potential and robustness in next-generation LLM-based chat systems.

https://www.aminer.cn/pub/650904db3fda6d7f06cd47ad/?f=cs

11.Sorted LLaMA: Unlocking the Potential of Intermediate Layers of Large Language Models for Dynamic Inference Using Sorted Fine-Tuning (SoFT)

This study points out the costly problem faced by the widespread application of large language models (LLM) in natural language processing (NLP). While these models perform well at understanding and generating human-like text, they are expensive to use at scale. Therefore, the researchers proposed a method called Sorted Fine-Tuning (SoFT), which converts large language models into dynamic inference models through sorted fine-tuning, without pre-training, and only replaces standard supervised fine-tuning at the same cost. (SFT). This approach improves model efficiency and eliminates the need to use multiple models for different scenarios during inference. The researchers also demonstrated that through this method, an intermediate-layer Transformer can be developed to generate a target output. These sub-models remain an integral part of the original model, reducing storage requirements and switching costs between different compute/latency budgets. By tuning LLaMa 2 13B on the Stanford Alpaca dataset and comparing it to conventional fine-tuning and early exit in the PandaLM benchmark, we show that Sorted Fine-Tuning can improve model speed by up to one degree while maintaining or exceeding performance. times. Therefore, this research solves the efficiency and cost issues in the popular application of large language models.

https://www.aminer.cn/pub/650904db3fda6d7f06cd4839/?f=cs


We have added the "Daily Selected New Papers" topic on the homepage of the AMiner website. You can click "Subscribe" and "Add to the Knowledge Base" to obtain all paper information!

Insert image description here

View all featured new papers: https://www.aminer.cn

Guess you like

Origin blog.csdn.net/AI_Conf/article/details/133138285