Track scientific research trends in real time丨Selected new papers from UC Berkeley, Google, Microsoft and other institutions

As a scientific researcher, you need to search and browse a large amount of academic literature every day to obtain the latest scientific and technological progress and research results.

However, traditional retrieval and reading methods can no longer meet the needs of scientific researchers. AMiner AI is a literature knowledge tool that integrates retrieval, reading, and knowledge Q&A. Help you quickly improve the efficiency of retrieval and reading papers, obtain the latest research trends in the field, and make scientific research work more comfortable.

Insert image description here

Combined with the cutting-edge news subscription function, arXiv selects the most popular new papers of the day and forms a paper review, allowing everyone to understand the cutting-edge news more quickly.

If you want to have an in-depth conversation about a certain paper, you can directly copy the paper link to the browser or go directly to the AMiner AI page: https://www.aminer.cn/chat/g/explain

List of selected new papers on September 13, 2023:

1.Efficient Memory Management for Large Language Model Serving with PagedAttentionRead the original text

The paper illustrates that efficient memory management is critical to improving throughput in large language model serving. A problem with the existing system is that the key-value cache (KV cache) memory for each request is very large and grows and shrinks dynamically. If memory management is not efficient, it can lead to memory fragmentation and redundant copies, which limits batch size. In order to solve this problem, the paper proposes an attention algorithm PagedAttention inspired by classic virtual memory and paging technology. On this basis, a language model service system called vLLM was built, which achieved (1) almost no waste in KV cache memory, and (2) flexibly shared KV cache within and between requests, further reducing memory use. Evaluation results show that compared to state-of-the-art systems such as FasterTransformer and Orca, vLLM improves the throughput of popular language models by 2-4 times at the same latency level. This improvement is even more pronounced when dealing with longer sequences, larger models, and more complex decoding algorithms.

https://www.aminer.cn/pub/65011be43fda6d7f060e4be3/?f=cs

2.PhotoVerse: Tuning-Free Image Customization with Text-to-Image Diffusion Models 阅读原文

illustrates the multiple challenges existing personalized text-to-image generation methods encounter when performing personalization, including long adjustment times, large storage requirements, the need for multiple input images per identity, and limitations in saving identities and editability . To address these obstacles, the authors propose an innovative method called PhotoVerse, which employs a dual-branch regulation mechanism in both text and image domains to effectively control the image generation process. Furthermore, the authors introduce facial identity loss as a new component that enhances identity preservation during training. The authors' method requires no test time adjustments and relies only on a single facial photo of the target identity, thereby significantly reducing the resource cost of image generation. After a single training session, the method can generate high-quality images in seconds. Furthermore, the authors' approach produces diverse images containing a variety of scenes and styles. Extensive evaluation demonstrates the excellent performance of the authors' approach, both in terms of identity preservation and ease of editing.

https://www.aminer.cn/pub/65011bda3fda6d7f060e4678/?f=cs

3.InstaFlow: One Step is Enough for High-Quality Diffusion-Based Text-to-Image Generation 阅读原文

The paper mainly solves the problem of slow multi-step sampling process of diffusion model in text to image generation. Previous attempts via distillation, while making some progress in reducing computational costs, failed to achieve a fully functional one-step model. The authors propose a novel text conditional flow based on the recent method Rectified Flow, transforming Stable Diffusion (SD) into an ultra-fast one-step model. Using this process, the author created the first one-step diffusion model with SD-level image quality. Its FID (Frechet Inception Distance) on MS COCO 2017-5k was 23.3, which significantly exceeded the previous state-of-the-art technology. By using an extended network with 1.7B parameters, the authors further improved the FID to 22.4. On MS COCO 2014-30k, InstaFlow achieved an FID of 13.1 in 0.09 seconds, which is currently the best performance in the 0.1 second range and surpasses the recent StyleGAN-T model. Notably, InstaFlow’s training took only 199 A100 GPU days.

https://www.aminer.cn/pub/65011be43fda6d7f060e4cae/?f=cs

4.Natural Language Supervision for General-Purpose Audio RepresentationsRead the original text

The paper discusses current problems in speech and audio representation learning. Although significant results have been achieved, there is still a performance gap between models for general tasks and models for specific tasks. This paper proposes a contrastive learning speech-audio pre-training model that leverages an innovative encoder for pre-training for zero-shot inference and uses a diverse dataset containing speech and text. By introducing audio and language representations into a joint multimodal space, the model improves performance on downstream tasks. In addition, the paper conducts an extensive evaluation of the model's generalization ability on 26 downstream tasks and achieves state-of-the-art results in some tasks, paving the way for general-purpose audio representation learning.

https://www.aminer.cn/pub/65011bda3fda6d7f060e465e/?f=cs

5.Large Language Model for Science: A Study on P vs. NPRead the original text

This paper discusses one of the most important unsolved problems in theoretical computer science and mathematics, the P vs. NP problem, and proposes the use of large language models (LLMs) to enhance and accelerate research on this problem. The researchers proposed a general framework called Socratic reasoning, which uses LLMs to conduct in-depth thinking when solving complex problems. Socratic reasoning encourages LLMs to discover, solve, and integrate problems recursively and promotes self-evaluation and optimization. In the preliminary study of P and NP problems, the researchers used GPT-4 to successfully generate a proof structure, and conducted rigorous reasoning in 97 conversations, and came to the conclusion of "P≠NP", which is consistent with (Xu and Zhou, 2023). This study reveals new insights into LLMs in a vast solution space, providing new insights into LLMs in science.

https://www.aminer.cn/pub/65011bda3fda6d7f060e460e/?f=cs

6.AstroLLaMA: Towards Specialized Foundation Models in AstronomyRead the original text

In highly specialized fields such as academic astronomy, large language models often perform poorly. To bridge this gap, the researchers introduced AstroLLaMA, a 7-billion-parameter model fine-tuned from LLaMA-2 using more than 300,000 astronomy abstracts on arXiv. AstroLLaMA is optimized for traditional causal language modeling, and its perplexity is 30% lower than Llama-2, showing obvious domain adaptation capabilities. Despite significantly fewer parameters than other base models, our model produces text completion and embedding extractions that are more insightful and scientifically relevant than other state-of-the-art base models. AstroLLaMA is a robust domain-specific model with extensive fine-tuning potential. Its public release aims to promote astronomy-focused research, including automated paper summarization and conversational agent development.

https://www.aminer.cn/pub/65011be43fda6d7f060e4bad/?f=cs

7.Textbooks Are All You Need II: phi-1.5 technical reportRead the original text

Researchers develop small Transformer language models to improve natural language reasoning capabilities and their results. They proposed ways to use existing large language models to generate "textbook quality" data to enhance the learning process, and developed a 1.3 billion parameter model called phi-1.5 that can perform as well as 5 times larger on natural language tasks. The model compares to and outperforms most non-cutting-edge language models in complex reasoning tasks such as elementary school math and basic coding. However, this model also has some problems, such as hallucinations and potentially harmful and biased generated text, although these problems have been improved to some extent due to the lack of network data. Finally, they made phi-1.5 open source to facilitate further research.

https://www.aminer.cn/pub/64ffcbe23fda6d7f06d007c8/?f=cs

8.NExT-GPT: Any-to-Any Multimodal LLMRead the original text

Research has pointed out a limitation of current multimodal large language models (MM-LLM), that is, they only have the ability to understand multimodality on the input side, but not the ability to generate multimodal content. To achieve human-level artificial intelligence, it becomes critical to develop any-to-any multimodal LLM systems that can accept and generate any modal content. To fill this gap, researchers propose an end-to-end general any-to-any multi-modal LLM system called NExT-GPT. They connected an LLM with multimodal adapters and different diffusion decoders, enabling NExT-GPT to sense input and generate output in any combination of text, images, video, and audio. By leveraging already well-trained and high-performance encoders and decoders, NExT-GPT requires only a small proportion (1%) of adjustments to the parameters of certain projection layers, which not only facilitates low-cost training but also facilitates be extended to more potential modes. In addition, the researchers introduced a modal switching instruction adjustment (MosIT) method and manually organized a high-quality dataset for MosIT. Based on this dataset, NExT-GPT is capable of complex cross-modal semantic understanding. and content generation capabilities. Overall, this research demonstrates the promising possibility of building AI agents capable of modeling universal modalities, paving the way for more human-like AI research in the community.

https://www.aminer.cn/pub/64ffcc023fda6d7f06d03cca/?f=cs


How to use AMiner AI?

The method of using AMiner AI is very simple. Open the AMiner homepage and enter the AMiner AI page from the navigation bar at the top of the page or the lower right corner.

AMiner AI usage tutorial: https://live.csdn.net/v/314755

Guess you like

Origin blog.csdn.net/AI_Conf/article/details/132970494