LLM Paper Weekly Report|Paper research from Tsinghua, MetaAI, Nous Research and other institutions

Large Model (LLM) is an artificial intelligence model designed to understand and generate human language. They are trained on large amounts of text data and can perform a wide range of tasks, including text summarization, translation, sentiment analysis, and more. LLMs are characterized by their large scale, containing billions of parameters, helping them learn complex patterns in linguistic data. These models are often based on deep learning architectures such as transformers, which helps them achieve impressive performance on a variety of NLP tasks.

At the end of 2022, OpenAI launched ChatGPT, a large-scale language model based on GPT-3.5. Due to its excellent performance, ChatGPT and the large-scale language model behind it quickly became a hot topic in the field of artificial intelligence, attracting the attention and attention of a large number of scientific researchers and developers. participate.

This week we have selected 10 outstanding papers in the field of LLM, from Tsinghua University, MetaAI, Nous Research and other institutions.

In order to facilitate everyone's reading, only the paper title, author, ChatPaper review and other information are listed. If you are interested, you can click on the link to view the original text. The data on the PC side is synchronized (you can view it on the PC side if you bookmark it). New papers every day can also be logged in to the small page. Program view.

1. GPT Can Solve Mathematical Problems Without a Calculator

The abstract of the paper shows that GPT can solve mathematical problems without a calculator. Previous research generally concluded that large language models are unable to accurately perform multi-digit multiplication operations without the use of calculator tools, especially multiplication of numbers above 8 digits, and operations involving decimals and fractions. However, this article aims to challenge this misconception. With sufficient training data, a 20 billion parameter language model can accurately perform multi-digit arithmetic operations with an accuracy of nearly 100% without data leakage, significantly exceeding GPT-4 (its multi-digit multiplication accuracy is only is 4.3%). In addition, the article also shows that their MathGLM (fine-tuned from GLM-10B on a data set containing additional multi-step arithmetic operations and textual descriptions of mathematical problems) achieved the same results as GPT- on a 5000-sample Chinese mathematical problem test set. 4 similar performance.

Link: https://www.aminer.cn/pub/64fa84403fda6d7f06700708

2. Large Language Models as Optimizers

The abstract of the paper shows that GPT can solve mathematical problems without a calculator. Previous research generally concluded that large language models are unable to accurately perform multi-digit multiplication operations without the use of calculator tools, especially multiplication of numbers above 8 digits, and operations involving decimals and fractions. However, this article aims to challenge this misconception. With sufficient training data, a 20 billion parameter language model can accurately perform multi-digit arithmetic operations with an accuracy of nearly 100% without data leakage, significantly exceeding GPT-4 (its multi-digit multiplication accuracy is only is 4.3%). In addition, the article also shows that their MathGLM (fine-tuned from GLM-10B on a data set containing additional multi-step arithmetic operations and textual descriptions of mathematical problems) achieved the same results as GPT- on a 5000-sample Chinese mathematical problem test set. 4 similar performance.

Link: https://www.aminer.cn/pub/64fa84403fda6d7f067007b3

3. Relay Diffusion: Unifying diffusion process across resolutions for image synthesis

This paper explores an approach to leveraging large language models (LLMs) as optimizers, called Optimizer Pass Prompts (OPRO). Derivative-based algorithms are powerful tools in a variety of applications, but their lack of gradients creates challenges in many practical applications. In this paper, the authors propose a simple yet effective approach to exploit large language models (LLMs) as optimizers using natural language descriptions of cues for optimization tasks. In each optimization step, LLM generates a new solution based on a hint containing previously generated solutions and their values, then evaluates the new solution and adds it to the hints for the next optimization step. The authors first demonstrate OPRO on linear regression and the Traveling Salesman Problem (TSP), then move to cue optimization, with the goal of finding instructions that maximize task accuracy. By using various LLMs, the authors demonstrate that optimizing the best hints via OPRO improves upon human-designed hints by up to 8% on GSM8K and up to 50% on Big-Bench Hard tasks.

Link: https://www.aminer.cn/pub/64fa84403fda6d7f06700777

4. Physically Grounded Vision-Language Models for Robotic Manipulation

This paper studies the application of physically based visual language models (VLM) to robot manipulation tasks. Although recent visual language models have made significant progress on tasks such as visual question answering and picture description, they have limitations in understanding physical concepts (such as object material, fragility, etc.), which limits their application when it comes to object interaction. and practicality of physical reasoning in robot manipulation tasks. To address this problem, the authors propose the PhysObjects dataset, which contains 36,900 crowdsourced and 417,000 automatically generated physical concept annotations of common household objects. The authors show that fine-tuning VLM on PhysObjects can improve its understanding of physical object concepts by capturing human prior knowledge in the visual appearance of objects. They incorporated this physics-based VLM into an interactive framework combined with a large language model and demonstrated planning compared to baselines that did not utilize a physics-based VLM in tasks requiring reasoning about physical object concepts. Performance improvements. In addition, they also demonstrated the advantages of this physics-based VLM on a real robot, significantly improving the mission success rate. The authors
published their dataset at https://iliad.stanford.edu/pg-vlm/ and provide more details and visualizations of the results.

Link: https://www.aminer.cn/pub/64f933e53fda6d7f067a11b7

5. SLiMe: Segment Like Me

This paper introduces a new method called SLiMe (Segment Like Me) for using large visual language models (such as Stable Diffusion) in image segmentation tasks. SLiMe achieves segmentation of images at any desired granularity by transforming the problem into an optimization task, using only a single labeled sample. Specifically, given a training image and its segmentation mask, SLiMe first extracts attention maps, including our novel “weighted cumulative self-attention map” from the SD prior. Then, the extracted attention maps are used to optimize the text embeddings of Stable Diffusion so that each embedding learns a single segmentation region in the training image. These learned embeddings then highlight segmented regions in the attention map, which in turn can be used to extract segmentation maps. This enables SLiMe to use the granularity of segmented regions in training images during inference to segment any real image with just a single example. Furthermore, the performance of SLiMe can be improved when additional training data is available (e.g., a small number of samples). By conducting a series of rich experiments to study various design factors, the authors demonstrate that SLiMe outperforms other existing single-sample and few-sample segmentation methods.

Link: https://www.aminer.cn/pub/64f933e53fda6d7f067a142a

6. RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback

This paper introduces a new method called SLiMe (Segment Like Me) for using large visual language models (such as Stable Diffusion) in image segmentation tasks. SLiMe achieves segmentation of images at any desired granularity by transforming the problem into an optimization task, using only a single labeled sample. Specifically, given a training image and its segmentation mask, SLiMe first extracts attention maps, including our novel “weighted cumulative self-attention map” from the SD prior. Then, the extracted attention maps are used to optimize the text embeddings of Stable Diffusion so that each embedding learns a single segmentation region in the training image. These learned embeddings then highlight segmented regions in the attention map, which in turn can be used to extract segmentation maps. This enables SLiMe to use the granularity of segmented regions in training images during inference to segment any real image with just a single example. Furthermore, the performance of SLiMe can be improved when additional training data is available (e.g., a small number of samples). By conducting a series of rich experiments to study various design factors, the authors demonstrate that SLiMe outperforms other existing single-sample and few-sample segmentation methods.

Link: https://www.aminer.cn/pub/64f59fc23fda6d7f0648f1fb

7. FLM-101B: An Open LLM and How to Train It with $100K Budget

This paper describes the FLM-101B open large language model (LLM) and how it was trained with a budget of $100,000. Although large language models (LLMs) have achieved remarkable success in NLP and multi-modal tasks, their development faces two major challenges: high computational cost and difficulty in fair and objective evaluation. The prohibitive cost of developing LLMs makes their training affordable to only a few large players, thus limiting research and application opportunities. Therefore, low-cost LLM training is very important. In this paper, the authors leverage the growing strategy to significantly reduce LLM training costs and demonstrate that an LLM with 101B parameters and 0.31TB markers can be trained with a budget of $100,000. In addition, the authors adopted a systematic assessment paradigm to assess LLM's IQ to supplement existing assessments that focus more on knowledge-oriented abilities. The authors introduce assessment of key aspects of intelligence including symbol mapping, IT rule understanding, pattern mining, and resistance to interference to minimize the impact of memory. Experimental results show that the authors' model FLM-101B (trained using a budget of $100,000) performs better on IQ benchmark evaluations than powerful and well-known models such as GPT-3, especially in contexts not seen in the training data. Similar performance to GLM-130B). Checkpoints for FLM-101B will be
open source at https://huggingface.co/CofeAI/FLM-101B.

Link: https://www.aminer.cn/pub/64fa84403fda6d7f06700975

8. YaRN: Efficient Context Window Extension of Large Language Models

This paper describes the FLM-101B open large language model (LLM) and how it was trained with a budget of $100,000. Although large language models (LLMs) have achieved remarkable success in NLP and multi-modal tasks, their development faces two major challenges: high computational cost and difficulty in fair and objective evaluation. The prohibitive cost of developing LLMs makes their training affordable to only a few large players, thus limiting research and application opportunities. Therefore, low-cost LLM training is very important. In this paper, the authors leverage the growing strategy to significantly reduce LLM training costs and demonstrate that an LLM with 101B parameters and 0.31TB markers can be trained with a budget of $100,000. In addition, the authors adopted a systematic assessment paradigm to assess LLM's IQ to supplement existing assessments that focus more on knowledge-oriented abilities. The authors introduce assessment of key aspects of intelligence including symbol mapping, IT rule understanding, pattern mining, and resistance to interference to minimize the impact of memory. Experimental results show that the authors' model FLM-101B (trained using a budget of $100,000) performs better on IQ benchmark evaluations than powerful and well-known models such as GPT-3, especially in contexts not seen in the training data. Similar performance to GLM-130B). Checkpoints for FLM-101B will be
open source at https://huggingface.co/CofeAI/FLM-101B.

Link: https://www.aminer.cn/pub/64f59fc23fda6d7f0648f11d

9. Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning

This paper introduces a multimodal language model called CM3Leon with the ability to generate and fill text and images. CM3Leon uses the CM3 multimodal architecture but further shows the extreme benefits of scaling and adapting on more diverse imperative data. This is the first multimodal model trained using recipes adapted from a text-only language model, including a large-scale retrieval-augmented pre-training stage and a second multi-task supervised fine-tuning (SFT) stage. It is also a general model that can perform both text-to-image and image-to-text generation, allowing us to introduce self-contained contrastive decoding methods that produce high-quality output. Extensive experiments have proven that this recipe is very effective for multi-modal models. CM3Leon achieves state-of-the-art performance in text-to-image generation with only 1/5 the training computation of comparable methods (zero-ray MS-COCO FID of 4.88). After SFT, CM3Leon can also exhibit unprecedented controllability in a variety of tasks, from language-guided image editing to image-controlled generation and segmentation.

Link: https://www.aminer.cn/pub/64f933e53fda6d7f067a11d5

10. XGen-7B Technical Report

This paper introduces the XGen-7B model, a family of 7 billion parameter models capable of handling sequence lengths up to 8K and trained on up to 1.5 trillion labeled data. In order to better support long sequence lengths, the author also fine-tuned the model on teaching data in the public domain and generated an instruction-adjusted XGen-Inst model. These models can be used both for research advancement and commercial applications. The authors' evaluation results on standard benchmarks show that the XGen model achieves comparable or better results when compared with state-of-the-art open source LLMs. The authors also conducted a targeted evaluation on long sequence modeling tasks, showing that their 8K sequence model is better than the open source 2K sequence LLM.

Link: https://www.aminer.cn/pub/64fa84403fda6d7f067007dd


How to use ChatPaper?

The method of using ChatPaper is very simple. Open the AMiner homepage and enter the ChatPaper page from the navigation bar at the top of the page or the lower right corner.

ChatPaper usage tutorial: Click here to view

Guess you like

Origin blog.csdn.net/AI_Conf/article/details/132848673