Real-time tracking of scientific research trends丨New papers selected on 9.22 from MIT, Peking University, Stanford and other institutions

As a scientific researcher, you need to search and browse a large amount of academic literature every day to obtain the latest scientific and technological progress and research results.

However, traditional retrieval and reading methods can no longer meet the needs of scientific researchers.

AMiner AI is a literature knowledge tool that integrates retrieval, reading, and knowledge Q&A. Help you quickly improve the efficiency of retrieval and reading papers, obtain the latest research trends in the field, and make scientific research work more comfortable.
Insert image description here
If you want to have an in-depth conversation about a certain paper, you can directly copy the paper link to the browser or go directly to the AMiner AI page: https://www.aminer.cn/chat/g/explain

List of selected new papers on September 22, 2023:

1.LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models

The paper introduces a method called LongLoRA, which can effectively fine-tune large language models and expand the context size of the model with limited computational cost. Typically, training language models with long context sizes requires significant computational resources and time. For example, a context length of 8192 requires 16 times the computational cost compared to a context length of 2048. This paper proposes two methods to accelerate context expansion of language models. On the one hand, global attention needs to be used during inference, but fine-tuning using sparse local attention can achieve efficient computation. By introducing the proposed shift short attention method, the context can be effectively extended, and compared with using traditional attention for fine-tuning, very considerable computing resources can be saved while having similar performance. It is particularly worth mentioning that during model training, you only need to add two lines of code to implement this method, and you can choose whether to use it during the inference process. On the other hand, in the fine-tuning process of context expansion, the author re-examines the parameter-effective fine-tuning mechanism. It is worth noting that the author found that the context-expanded LoRA mechanism performed well under the premise of trainable embedding and normalization. LongLoRA demonstrates strong empirical results on the LLaMA2 model from 7B/13B to 70B. LongLoRA is still able to maintain the original architecture of the model and is compatible with most existing technologies (such as FlashAttention-2). Furthermore, in order to make LongLoRA practical for application, the authors collected a dataset named LongQA for supervised fine-tuning, which contains more than 3k long-context question-answer pairs.

https://www.aminer.cn/pub/650cf92d3fda6d7f06d445d9/?f=cs

2.A Paradigm Shift in Machine Translation: Boosting Translation Performance of Large Language Models

This article points out that in machine translation tasks, generative large language models (LLMs) with moderate model sizes (e.g., 7B or 13B parameters) still lag behind traditional supervised encoder-decoder translation models in performance, and previous research attempts Improves the translation capabilities of these modest LLMs, but with limited gains. To solve this problem, the authors propose a new LLM fine-tuning method specifically designed for translation tasks, eliminating the need for large amounts of parallel data that traditional translation models usually rely on. The method consists of two stages of fine-tuning: an initial fine-tuning on monolingual data, and then subsequent fine-tuning on a small set of high-quality parallel data. The authors introduce an LLM developed through this strategy, named Advanced Language Model-based trAnslator (ALMA). Based on their underlying model LLaMA-2, experimental results show that the model achieves an average improvement of more than 12 points relative to zero-shot performance on the test data sets of WMT'21 (2 directions) and WMT'22 (8 directions) BLEU and 12 COMET. The performance is significantly better than all previous work, even better than the NLLB-54B model and GPT-3.5-text-davinci-003 with 7B or 13B parameters. This approach lays the foundation for a new training paradigm in machine translation.

https://www.aminer.cn/pub/650cf9223fda6d7f06d42a80/?f=cs

3.LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset

This article introduces a large-scale dataset called LMSYS-Chat-1M, which contains 1 million real conversations with 25 state-of-the-art large language models (LLM). This dataset was collected in the wild from 210,000 unique IP addresses from our Vicuna demo and Chatbot Arena website. The article provides an overview of the dataset's contents, including its curation process, basic statistics and topic distribution, emphasizing its diversity, originality and scale. The dataset’s versatility is demonstrated through four use cases: developing a content moderation model with similar performance to GPT-4, building a security benchmark, training an instruction following model with similar performance to Vicuna, and creating challenging benchmark problems. The authors believe this dataset will become a valuable resource for understanding and advancing LLM capabilities.

https://www.aminer.cn/pub/650cf92d3fda6d7f06d4447f/?f=cs

4.RMT: Retentive Networks Meet Vision Transformers

This article mainly raises a question, that is, whether transferring the ideas of RetNet to the visual field can show excellent performance in visual tasks. The author proposed RMT by combining RetNet and Transformer and demonstrated its outstanding performance in various computer vision tasks. In addition, the authors also pointed out that RMT significantly outperforms other visual backbone networks in downstream tasks such as target detection, instance segmentation, and semantic segmentation compared with existing visual backbone networks.

https://www.aminer.cn/pub/650cf9223fda6d7f06d429e6/?f=cs

5.LLM-Grounder: Open-Vocabulary 3D Visual Grounding with Large Language Model as an Agent

This article introduces the LLM-Grounder method to solve the problem of 3D visual positioning. Specifically, the authors pointed out that existing methods often rely on large amounts of annotated data or have certain limitations in processing complex language queries. The LLM-Grounder method decomposes complex natural language queries into semantic parts by utilizing large-scale language models (LLM), and uses visual localization tools such as OpenScene or LERF to identify objects in 3D scenes. LLM then evaluates the spatial and common sense relationships between the proposed objects to make the final positioning decision. This method does not require any labeled training data and can be generalized to new 3D scenes and arbitrary text queries. The authors evaluate the LLM-Grounder method on the ScanRefer benchmark and demonstrate state-of-the-art zero-shot localization accuracy. The research results show that LLM significantly improves localization capabilities, especially for complex language queries, making LLM-Grounder an effective method for 3D visual language tasks in robots.

https://www.aminer.cn/pub/650cf92d3fda6d7f06d445de/?f=cs

6.MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models

The article illustrates that existing open source large-scale language models still have gaps in solving mathematical problems because of the complexity of the mathematical reasoning process. To solve this problem, an optimized language model called MetaMath is proposed, specifically for mathematical reasoning. We first bootstrapped the mathematical problem by rewriting the problem from multiple perspectives and generated a new dataset called MetaMathQA. The LLaMA-2 model is then fine-tuned using MetaMathQA. Experimental results show that MetaMath outperforms a set of open source LLM models on two commonly used benchmarks for mathematical reasoning. Among them, MetaMath-7B achieved accuracy rates of 66.4% and 19.4% on GSM8K and MATH respectively, which are 11.5% and 8.7% higher than the state-of-the-art models of the same scale. In particular, the accuracy of MetaMath-70B on GSM8K reaches 82.3%, which is slightly better than GPT-3.5-Turbo. The authors also publicly released the MetaMathQA dataset, MetaMath models of different model sizes, and training code for public use.

https://www.aminer.cn/pub/650cf92d3fda6d7f06d445be/?f=cs

7.BTLM-3B-8K: 7B Parameter Performance in a 3B Parameter Model

This article explains the following issues: 1. A new language model BTLM-3B-8K is introduced, which is an open source language model with 3 billion parameters. 2. BTLM-3B-8K is trained on the SlimPajama dataset using a mixture of 2,048 and 8,192 context lengths, which contains 627B of tokens. 3. BTLM-3B-8K improves performance on downstream tasks by 2-5.5% over all existing 3 billion parameter models. It even competes with some 7 billion parameter models. 4. BTLM-3B-8K performs well on long context tasks, exceeding the performance of MPT-7B-8K and XGen-7B-8K on tasks with up to 8,192 context lengths. 5. The author used the cleaned and deduplicated SlimPajama data set when training the model, adjusted the hyperparameters and scheduling, and used ALiBi position embedding and SwiGLU nonlinear activation functions. 6. On Hugging Face, the most popular model has 7 billion parameters, which shows that users prefer the quality and scale ratio of 7 billion models. 7. Compressing a 7 billion parameter model into a 3 billion parameter model with almost no performance loss is an important milestone. 8. BTLM-3B-8K requires only 3GB of memory and 4-digit precision, and uses 2.5 times less computing resources than the 7 billion model when computing inference, which can help use powerful language models on mobile and edge devices. 9. BTLM-3B-8K is available on Hugging Face under an Apache 2.0 license.

https://www.aminer.cn/pub/650cf9223fda6d7f06d42a14/?f=cs

8.Boolformer: Symbolic Regression of Logic Functions with Transformers

This paper introduces Boolformer, a Transformer architecture that is the first Transformer architecture trained to perform symbolic regression of Boolean functions. First, the paper shows that when provided with a clean truth table, Boolformer is able to predict concise formulas for complex functions, even if these functions have not appeared in training. The paper then demonstrates Boolformer's ability to find approximate expressions when provided with incomplete and noisy observational data. The paper evaluates Boolformer on a wide range of real-world binary classification data sets, demonstrating its potential as an interpretable alternative to traditional machine learning methods. Finally, the paper applies Boolformer to the common task of modeling gene regulatory network dynamics. Using state-of-the-art benchmarks, the paper shows that Boolformer competes with state-of-the-art genetic algorithms and is orders of magnitude faster. The code and models of the paper are publicly available.

https://www.aminer.cn/pub/650cf92d3fda6d7f06d44568/?f=cs


END

We have added the "Daily Selected New Papers" topic on the homepage of the AMiner website. You can click "Subscribe" and "Add to the Knowledge Base" to obtain all paper information!

Insert image description here
View all featured new papers: https://www.aminer.cn

Guess you like

Origin blog.csdn.net/AI_Conf/article/details/133268600