LLM Paper Weekly | Cutting-edge paper research from Meta AI, Zhejiang University, Tsinghua University, ETH Zurich and other institutions

Large Model (LLM) is an artificial intelligence model designed to understand and generate human language. They are trained on large amounts of text data and can perform a wide range of tasks, including text summarization, translation, sentiment analysis, and more. LLMs are characterized by their large scale, containing billions of parameters, helping them learn complex patterns in linguistic data. These models are often based on deep learning architectures such as transformers, which helps them achieve impressive performance on a variety of NLP tasks.

At the end of 2022, OpenAI launched ChatGPT, a large-scale language model based on GPT-3.5. Due to its excellent performance, ChatGPT and the large-scale language model behind it quickly became a hot topic in the field of artificial intelligence, attracting the attention and attention of a large number of scientific researchers and developers. participate.

This week, 10 outstanding papers in the field of LLM are selected from institutions such as Meta AI, Zhejiang University, Tsinghua University, and ETH Zurich.

In order to facilitate everyone's reading, only the paper title, author, ChatPaper review and other information are listed. If you are interested, you can click on the link to view the original text. The data on the PC side is synchronized (you can view it on the PC side if you bookmark it). New papers every day can also be logged in to the small page. Program view.

1. SeamlessM4T-Massively Multilingual & Multimodal Machine Translation

This paper introduces a large-scale multilingual and multimodal machine translation model called SeamlessM4T, which can help individuals translate speech between up to 100 languages. While text-based models have recently surpassed translation coverage in 200 languages, unified speech-to-speech translation models have yet to achieve similar progress. To solve this problem, the authors propose a single model that supports speech-to-speech translation, speech-to-text translation, text-to-speech translation, text-to-text translation, and automatic speech recognition. They used 1 million hours of open speech audio data to learn self-supervised speech representations and created a multi-modal automatically aligned speech translation corpus. By filtering and human-labeled and pseudo-labeled data, they developed the first multilingual system that can translate to and from English to speech and text. In the FLEURS evaluation, SeamlessM4T achieved a BLEU score that was 20% higher than the previous best in direct speech-to-text translation. Compared with the powerful cascade model, SeamlessM4T improves 1.3 BLEU points in speech-to-text translation and 2.6 ASR-BLEU points in speech-to-speech translation. After robustness testing, the system performed better against background noise and speaker changes in speech-to-text tasks. The authors also evaluated the translational safety of SeamlessM4T in terms of gender bias and added toxicity. Finally, they open sourced all their contributions on GitHub for more people to learn and use.

Link:
https://www.aminer.cn/pub/64e5849c3fda6d7f063af4d6

2. ChatHaruhi: Reviving Anime Character in Reality via Large Language Model

This paper introduces a method for resurrecting animated characters through large language models. Although role-playing chatbots based on large language models have attracted attention, better techniques are needed to achieve the imitation of specific fictional characters. The paper proposes an algorithm to control the language model by improving prompts and character memories extracted from the script. The author constructed a data set called ChatHaruhi, covering 32 Chinese and English TV series and animation characters, with a total of more than 54,000 simulated dialogues. Both automatic and human evaluations show that the method achieves significant improvements in role-playing capabilities over baseline methods.

Link:
https://www.aminer.cn/pub/64e2e15a3fda6d7f06466a72

3. Instruction Tuning for Large Language Models: A Survey

This paper reviews research work in the rapidly growing field of Instruction Tuning (IT). Instruction tuning is a key technology that can improve the capabilities and controllability of Large Language Models (LLMs). Instruction tuning refers to further training the LLM on a dataset containing \textsc{(instruction, output)} pairs under supervision, thereby bridging the gap between the LLM's next word prediction goal and the user's goal of making the LLM follow human instructions. gap. In this paper, we conduct a systematic review of the literature, including the general methodology of IT, the construction of IT data, the training of IT models, and their application in different modes, fields and applications, and also analyze the factors that affect IT results. aspects (e.g., generation of instruction output, size of instruction data set, etc.). We also review potential pitfalls of IT and its criticisms, point out current shortcomings of existing strategies, and suggest some useful research directions.

Link:
https://www.aminer.cn/pub/64e432c73fda6d7f0600b894

4. Code Llama: Open Foundation Models for Code

We released a set of large-scale language models called Code Llama, which is based on Llama 2 and provides code with state-of-the-art performance, padding capabilities, support for large input contexts, and zero-shot instruction following capabilities. We provide multiple variants to cover a wide range of applications: base model (Code Llama), Python expertise (Code Llama - Python), and instruction following model (Code Llama - Instruct), with parameters of 7 billion and 13 billion respectively and 34 billion. All models are trained on sequences of 16k tokens and show improvement on inputs with up to 100k tokens. The 7B and 13B Code Llama and Code Llama - Instruct variants support padding based on surrounding content. Code Llama achieves state-of-the-art performance on multiple code benchmarks, achieving scores of 53% and 55% on HumanEval and MBPP respectively. Notably, Code Llama - Python 7B outperforms Llama 2 70B on HumanEval and MBPP, while all our models outperform any other publicly available model on MultiPL-E. We release Code Llama under a permissive license that allows research and commercial use.

Link:
https://www.aminer.cn/pub/64e82e45d1d14e646633f5aa

5. ProAgent: Building Proactive Cooperative AI with Large Language Models

This paper introduces a new framework called ProAgent, which leverages large language models to help agents be more forward-looking and proactive in cooperation with humans or other agents. Traditional cooperative agent methods mainly rely on learning methods, and policy generalization relies heavily on past interactions with specific teammates, which limits the agent's ability to readjust its strategy when facing new teammates. ProAgent can foresee the future decisions of teammates and develop enhanced plans for itself, showing excellent cooperative reasoning capabilities and being able to dynamically adapt to improve the effectiveness of cooperation with teammates. Furthermore, the ProAgent framework is highly modular and interpretable and can be seamlessly integrated into various coordination scenarios. Experimental results show that ProAgent outperforms five self-game-based and population-based training methods in the Overcook-AI framework. In cooperation with human agent models, its performance improves by more than 10% on average, surpassing the current state-of-the-art. Advanced method COLE. This advancement is consistent across diverse scenarios involving interactions with AI agents and human opponents with different characteristics. These findings inspire future research into human-robot collaboration.
Hands-on demonstrations are available on the https://pku-proagent.github.io website.

Link:
https://www.aminer.cn/pub/64e5849c3fda6d7f063af3cd/

6. A Survey on Large Language Model based Autonomous Agents

This paper is an overview of research on autonomous agents based on large language models. Previous research often focused on training agents in isolated environments with limited knowledge, which is far from the human learning process, thus making it difficult for agents to achieve human-like decision-making. In recent years, large language models (LLMs) have shown great potential in achieving human-level intelligence by acquiring large amounts of network knowledge. This has triggered a surge in research on autonomous agents based on LLM. To fully exploit the potential of LLM, researchers have designed various agent architectures for different applications. In this paper, we conduct a systematic review of these studies as a whole. Specifically, we focus on building LLM-based agents, for which we propose a unified framework that covers most of the previous work. . In addition, we provide an overview of various applications of LLM-based artificial intelligence agents in the fields of social sciences, natural sciences, and engineering. Finally, we discuss common strategies for evaluating LLM-based artificial intelligence agents. Based on previous research, we also propose several challenges and future directions in this field.

Link:
https://www.aminer.cn/pub/64e5849c3fda6d7f063af42e

7. AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors in Agents

This paper mainly studies how to achieve multi-agent collaboration through autonomous agents enhanced with large language models (LLM), and explores the emerging behaviors that emerge in such collaboration. The authors propose a multi-agent framework called AgentVerse that can mimic human group dynamics and collaboratively adjust its composition to achieve the goal of making the whole effect greater than its parts. Experimental results show that the framework can effectively deploy multi-agent teams with better performance than a single agent. In addition, the author also deeply explores the emergence of social behaviors among individual agents within the team during the execution of collaborative tasks. In response to these behaviors, the authors discuss some possible strategies to exploit positive behaviors and mitigate negative ones, thereby increasing the collaborative potential of multi-agent teams. Relevant code will
be released at https://github.com/OpenBMB/AgentVerse.

Link:
https://www.aminer.cn/pub/64e432c73fda6d7f0600b8cd

8. Giraffe: Adventures in Expanding Context Lengths in LLMs

This paper mainly studies the problem of context length expansion of large language models (LLM). Existing LLMs typically use attention mechanisms and rely on fixed context lengths, which limits the length of input sequences they can process during evaluation. To address this issue, the authors conducted an extensive survey of different context length expansion methods and tested them on the base LLaMA or LLaMA 2 model. They also introduced some designs of their own, notably a new truncation strategy that modified the basis of positional encoding. The authors tested using three new evaluation tasks (FreeFormQA, AlteredNumericQA and LongChat-Lines) as well as perplexity and found that linear scaling is the best way to extend the context length. They also found that performance could be further improved by using longer zooms when evaluating. Furthermore, they found promising scaling capabilities on a truncated basis. To support further research in this area, the authors release three new 13 billion parameter long context models called Giraffe: 4k, 16k and 32k context models, which are derived from the bases LLaMA-13B and LLaMA2 -13B trained. They also released code that replicates the results.

Link:
https://www.aminer.cn/pub/64e432c73fda6d7f0600b8ef

9. Graph of Thoughts: Solving Elaborate Problems with Large Language Models

We introduce a framework called Graph of Thoughts (GoT), which goes beyond paradigms such as chain thinking and Tree of Thoughts (ToT), in large language models (LLMs). Progress has been made in prompting capabilities. The main advantage of GoT is that it can model the information generated by LLM as an arbitrary graph, where the units of information ("LLM think") are vertices and the edges represent the dependencies between these vertices. This approach makes it possible to combine arbitrary LLM thoughts into synergistic results, distill the essence of an entire network of thoughts, or use feedback loops to enhance thinking. We demonstrate GoT's superiority over existing techniques on different tasks, for example, improving sorting quality by 62% while reducing cost by 31%. We ensured that GoT can be extended to new thinking transformations and therefore can be used to drive new prompting schemes. This work brings LLM reasoning closer to human thinking, or human brain mechanisms like backtracking, both of which form complex networks.

Link:
https://www.aminer.cn/pub/64e2e15a3fda6d7f06466ace

10. Federated Learning in Big Model Era: Domain-Specific Multimodal Large Models

This paper mainly discusses the development of federated learning in the era of large models, and proposes a domain-specific federated learning framework for multi-modal large models. This framework allows multiple enterprises to jointly train large-scale models in vertical fields using private domain data to achieve intelligent services. The author discusses in depth the strategic changes in the intelligence foundation and goals of federated learning in the era of large models, as well as the new challenges faced, including heterogeneous data, model aggregation, performance and cost trade-offs, data privacy and incentive mechanisms, etc. The paper also describes through a case study how leading enterprises provide distributed deployment and effective coordination for urban safety operation management through multi-modal data and expert knowledge, as well as data quality improvement and technological innovation based on large-scale model capabilities. Preliminary experimental results show that enterprises can enhance and accumulate intelligent capabilities through multi-modal model federated learning, jointly create a smart city model, and provide high-quality smart services covering energy infrastructure security, residential community security, and urban operations management. The established federated learning cooperation ecosystem is expected to further integrate industrial, academic and research resources, realize large-scale models in multiple vertical fields, and promote large-scale industrial applications and cutting-edge research on artificial intelligence and multi-modal federated learning.

Link:
https://www.aminer.cn/pub/64e5846c3fda6d7f063ac938/

Guess you like

Origin blog.csdn.net/AI_Conf/article/details/132535703