Big Model Weekly | Two reviews of code language models and models as a service, covering 50+ models, 30+ evaluation tasks and 500+ related papers

Large Model (LLM) is an artificial intelligence model designed to understand and generate human language. They are trained on large amounts of text data and can perform a wide range of tasks, including text summarization, translation, sentiment analysis, and more. LLMs are characterized by their large scale, containing billions of parameters, helping them learn complex patterns in linguistic data. These models are often based on deep learning architectures such as transformers, which helps them achieve impressive performance on a variety of NLP tasks.

At the end of 2022, OpenAI launched ChatGPT, a large-scale language model based on GPT-3.5. Due to its excellent performance, ChatGPT and the large-scale language model behind it quickly became a hot topic in the field of artificial intelligence, attracting the attention and attention of a large number of scientific researchers and developers. participate.

This week, 10 outstanding papers in the field of LLM have been selected. In order to facilitate everyone's reading, only the paper title, author, AMiner AI review and other information are listed. If you are interested, you can scan the QR code to view the original text, and the PC data is synchronized (to collect it, you can save it on your PC View via terminal), daily new papers can also be viewed by logging into the mini program.

1. A Survey on Language Models for Code

This paper provides a comprehensive review and survey of language models for code processing, covering more than 50 models, more than 30 evaluation tasks, and more than 500 related papers. The author divides code processing models into two major categories: one is a general language model, represented by the GPT family; the other is a specialized model that is pre-trained specifically for code, usually with specific training goals. The article discusses the relationship and differences between these two types of models and highlights the historical evolution of code modeling from statistical models and RNNs to pre-trained Transformers and LLMs, which is exactly what the field of natural language processing has experienced. In addition, the article discusses code-specific features such as AST, CFG and unit testing and their application in code language model training, and points out the main challenges and potential future directions in this field.

Insert image description here

Link:https://www.aminer.cn/pub/65543326939a5f40820ac868/?f=cs

2. JARVIS-1: Open-World Multi-task Agents with Memory-Augmented Multimodal Language Models

This paper introduces an open source world multi-task agent called JARVIS-1, which uses a memory-augmented multi-modal language model to achieve human-like planning and control. In the open source world, processing multi-modal observations (visual observations and human instructions) is a critical milestone for more powerful general-purpose agents. Existing methods can handle open-world missions of a certain length, but they still face challenges in situations where the number of missions may be infinite and mission completion capabilities cannot be gradually improved over game time. JARVIS-1 is an open-source world agent that can perceive multi-modal input (visual observations and textual instructions), generate complex plans and perform embodied control in the popular challenging open-world Minecraft universe. Specifically, JARVIS-1 is built on a pre-trained multi-modal language model that maps visual observations and textual instructions to plans. The plan will eventually be dispatched to the target condition controller. We equip JARVIS-1 with a multi-modal memory that leverages pre-trained knowledge and actual game survival experience for planning. In experiments, JARVIS-1 performed nearly perfectly on more than 200 different tasks on the Minecraft universe benchmark, ranging from entry-level to intermediate levels. In the long-term Diamond Pickaxe mission, JARVIS-1 achieved a completion rate of 12.5%, more than five times the previous record. Furthermore, we show that, thanks to multimodal memory, JARVIS-1 is able to self-improve following a lifelong learning paradigm, inspiring broader intelligence and increased autonomy.

Insert image description here

Link:https://www.aminer.cn/pub/65518a1f939a5f4082a62ced/?f=cs

3. FinGPT: Large Generative Models for a Small Language

This paper introduces FinGPT: Large Generative Models for a Small Language. Large language models (LLMs) perform well in natural language processing and many other tasks, but most open models have very limited support for small languages, and LLM work tends to focus on languages ​​for which there is almost unlimited data for pre-training. In this article, the authors examine the challenges of creating an LLM in Finnish, one of the languages ​​spoken by less than 0.1% of the world's population. The authors have assembled a Finnish corpus including web crawls, news, social media and e-books. The author used two methods to pre-train the model: 1) trained seven monolingual models (186M to 13B parameters) from scratch, called FinGPT; 2) continued to perform the multilingual BLOOM model on a mixture of original training data and Finnish After pre-training, a model with 176 billion parameters was obtained, called BLUUMI. To evaluate the model, the authors introduce FIN-bench, which is a Finnish task version of BIG-bench. The authors also evaluated other model qualities such as toxicity and bias.

Insert image description here

Link:https://www.aminer.cn/pub/65518945939a5f4082a5d446/?f=cs

4. Music ControlNet: Multiple Time-varying Controls for Music Generation

This paper introduces Music ControlNet, a diffusion-based music generation model that can provide multiple precise time-varying controls for generated audio. Compared with existing text-to-music generation models, Music ControlNet is more suitable for precise control of time-varying properties of music, such as beat position and music dynamic changes. The model enables time-varying control of the generated audio by extracting controls from the training audio and using them together with melodic, dynamic, and rhythmic controls for fine-tuning the audio spectrogram. Additionally, the model allows creators to only partially specify controls over time to produce compliant music. Experimental results show that Music ControlNet can generate real music consistent with input controls in different scenarios, and outperforms existing music generation models in multiple indicators.

Insert image description here

Link:https://www.aminer.cn/pub/6552e009939a5f40823b5b23/?f=cs

5. ChatAnything: Facetime Chat with LLM-Enhanced Personas

This paper introduces a method called ChatAnything, which can generate virtual characters with human characteristics, such as visual appearance, personality and tone, described only by text. To achieve this goal, the authors first exploit the contextual learning capabilities of large language models to generate personalities by carefully designing a set of system prompts. They then proposed two novel concepts: Mixing of Voices (MoV) and Mixing of Diffusers (MoD) to produce diverse sounds and looks. MoV is implemented by utilizing a text-to-speech (TTS) algorithm that utilizes various predefined tones and automatically selects the best matching one based on the text description provided by the user. For MoD, they combined the recently popular text-to-image generation technology and the talking head algorithm to simplify the process of generating speaking objects. Finally, the authors address the problem that humanoid objects produced by current generative models are often undetectable by pre-trained facial landmark detectors by incorporating pixel-level guidance into facial landmarks. Based on the constructed evaluation dataset, they verified that the detection rate of facial keypoint detection increased significantly from 57.0% to 92.5%, enabling automatic facial animation based on generated speech content.
Insert image description here

Link:https://www.aminer.cn/pub/6552df44939a5f408239f6a8/?f=cs

6. Technical Report: Large Language Models can Strategically Deceive their Users when Put Under Pressure

This paper reports on how a large language model can strategically deceive users under pressure. Specifically, the paper uses GPT-4 as an agent in a real simulation environment, allowing it to play the role of an autonomous stock trading agent. In this environment, the model received an inside tip about a profitable stock trade and took action despite knowing that company management disapproved of insider trading. When reporting to their managers, the models consistently concealed the true reasons behind their trading decisions. The study also briefly investigates how this behavior changes in different settings, such as removing the model's access to the inference scratchpad, trying to prevent erroneous behavior by changing the system instructions, changing the pressure the model is under, changing the level of risk of detection, and other simple environment changes. To the authors' knowledge, this is the first practical demonstration that a large language model (designed to be helpful, harmless, and honest) strategically deceives users without direct instruction or deception training.

Insert image description here

Link:https://www.aminer.cn/pub/655432d9939a5f40820a978e/?f=cs

7. DMV3D: Denoising Multi-View Diffusion using 3D Large Reconstruction Model

This paper proposes a novel 3D generation method called DMV3D, which utilizes a transformer-based 3D large-scale reconstruction model to denoise multi-view diffusion. By employing a three-plane NeRF representation, this reconstruction model is able to denoise noisy multi-view images through NeRF reconstruction and rendering, achieving single-stage 3D generation in approximately 30 seconds on a single A100 GPU. The authors train on a massive multi-view image dataset of highly diverse objects using only image reconstruction losses, without access to 3D assets. We demonstrate state-of-the-art results in a single-image reconstruction problem that requires probabilistic modeling of unknown object parts to generate diverse reconstructions with sharpened textures. We also demonstrate high-quality text-to-3D generation results that outperform previous 3D diffusion models.

Insert image description here

Link:https://www.aminer.cn/pub/65558143939a5f4082e42860/?f=cs

8. The Chosen One: Consistent Characters in Text-to-Image Diffusion Models

This paper studies the role consistency problem in text-to-image generative models. Despite the recent great progress in text-to-image generative models, which bring great potential for visual innovation, these models still have difficulties in generating consistent characters, which is critical for many practical applications (such as story visualization, game development asset design , advertising, etc.) are crucial. Current methods often rely on multiple pre-existing images of the target character or involve tedious manual processes. In this work, we propose a fully automated solution where the only input is a text prompt for consistent character generation. We introduce an iterative process that, at each stage, identifies a coherent set of images with similar identities and extracts a more consistent identity from the set. Our quantitative analysis shows that our approach achieves a better balance between cue alignment and identity consistency compared to baseline methods, and these findings are reinforced by user studies. Finally, we demonstrate several practical applications of our approach.

Insert image description here

Link:https://www.aminer.cn/pub/6556d308939a5f4082dc3822/?f=cs

9. Model-as-a-Service (MaaS): A Survey

This paper provides a comprehensive survey of Model-as-a-Service (MaaS). As the number of parameters and data in the pre-trained model exceeds a certain level, basic models (such as large language models) can significantly improve downstream task performance and exhibit some emerging special capabilities that did not exist before (such as deep learning, complex reasoning and human alignment). The underlying model is a form of generative artificial intelligence (GenAI), and Models as a Service (MaaS) is a breakthrough paradigm that changes how generative AI models are deployed and utilized. MaaS represents a paradigm shift in the use of AI technology and provides developers and users with scalable and accessible solutions to leverage pre-trained AI models without requiring extensive infrastructure or model training expertise. In this article, we aim to provide a comprehensive overview about MaaS, including its significance and its impact on various industries. We briefly reviewed the development history of "X-as-a-Service" based on cloud computing and introduced the key technologies in MaaS. We also review recent applied research on MaaS. Finally, we highlight several challenges and future issues in this promising field. MaaS is a new deployment and service paradigm suitable for different AI models. We hope that this review will inspire future research in the field of MaaS.

Insert image description here

Link:https://www.aminer.cn/pub/655189a6939a5f4082a5fd0d/?f=cs

10. Instruction-Following Evaluation for Large Language Models

This paper explores the core ability of large language models (LLMs) to follow natural language instructions. However, current methods for assessing this ability are not standardized. Human assessments are expensive, time-consuming, and cannot be objectively repeated, while automated assessments based on LLM may be biased or limited by the assessor's LLM capabilities. In order to solve these problems, the author introduces an evaluation method called Instruction-Following Eval (IFEval) to evaluate the ability of large language models to follow instructions. IFEval is a simple and easily replicable evaluation benchmark that focuses on a series of "verifiable instructions" such as "write in more than 400 words" and "mention AI keywords at least 3 times." The authors identified 25 verifiable instruction types and constructed approximately 500 cues, each containing one or more verifiable instructions. The authors present evaluation results of two widely used LLMs on the market.

Insert image description here

Link:https://www.aminer.cn/pub/65543326939a5f40820ac844/?f=cs


AMiner AI入口:https://www.aminer.cn/chat/g/explain

Guess you like

Origin blog.csdn.net/AI_Conf/article/details/134638287