Real-time tracking of scientific research trends丨Huan Liu, Jiebo Luo, Jinyu Li and others selected new papers on 8.15, with ChatPaper review

As a scientific researcher, you need to search and browse a large amount of academic literature every day to obtain the latest scientific and technological progress and research results. However, traditional retrieval and reading methods can no longer meet the needs of scientific researchers.

ChatPaper is a literature knowledge tool that integrates retrieval, reading, and knowledge Q&A. Help you quickly improve the efficiency of retrieval and reading papers, obtain the latest research trends in the field, and make scientific research work more comfortable.

Insert image description here

Combined with the cutting-edge news subscription function, arXiv selects the most popular new papers of the day and forms a paper review, allowing everyone to understand the cutting-edge news more quickly.

If you want to have an in-depth conversation about a certain paper, you can directly copy the paper link to the browser or go directly to the ChatPaper page: https://www.aminer.cn/chat/g/explain

List of selected new papers on August 15, 2023:

1.SpeechX: Neural Codec Language Model as a Versatile Speech Transformer

https://www.aminer.cn/pub/64dafb293fda6d7f064e2cae/

ChatPaper Review: It points out that the current generative speech model based on audio-text prompts has made significant progress in achieving high-quality zero-sample text-to-speech and other innovations. However, existing models still have limitations in handling a variety of audio-to-text speech generation tasks, including converting input speech and processing audio captured in harsh acoustic environments. This article introduces SpeechX, a versatile speech generation model capable of zero-shot text-to-speech and various speech conversion tasks, handling both clean and noisy signals. SpeechX combines neural codec language modeling with multi-task learning, leveraging task-relevant cues to enable unified and scalable modeling, and provides a consistent way to leverage text input in speech enhancement and transformation tasks. Experimental results show that SpeechX shows comparable or higher performance than specialized models in various tasks such as zero-sample text-to-speech, noise suppression, target speaker extraction, speech removal, and speech editing with or without background noise.

2.Platypus: Quick, Cheap, and Powerful Refinement of LLMs

https://www.aminer.cn/pub/64dafb2f3fda6d7f064e349c/

ChatPaper review: Introducing a refinement method for large-scale language models called Platypus, which achieves the strongest performance and ranks first on HuggingFace's open language model rankings by fine-tuning and merging models. In the article, the authors describe their dataset Open-Platypus and the process of refining the merged model, and provide a reference for future research by verifying the methods of test data leakage and training data contamination. Models from the Platypus family perform well on quantitative language model metrics, using far less fine-tuning data and overall computational effort than required by other state-of-the-art fine-tuning models. In particular, the 13B Platypus model can be trained in 5 hours using 25k problems on a single A100 GPU. This demonstrates the quality of the authors' Open-Platypus dataset and provides opportunities for further improvements in the field.

3.OctoPack: Instruction Tuning Code Large Language Models

https://www.aminer.cn/pub/64dafb293fda6d7f064e2db0/

ChatPaper review: This paper examines the problem of using instruction tuning on large language models. Instruction adjustments are made by leveraging Git-committed code changes and the natural structure of human instructions. Researchers compiled CommitPack, which contains 4TB of Git commit records in 350 programming languages. On the 16B parameter StarCoder model, they compared CommitPack with other natural language and synthetic code instructions (xP3x, Self-Instruct, OASST) and achieved the best performance on the HumanEval Python benchmark (46.2% pass @1). In addition, the author also introduces HumanEvalPack, which extends the HumanEval benchmark to 3 coding tasks (code repair, code explanation, code synthesis) and 6 languages ​​(Python, JavaScript, Java, Go, C++, Rust). Their models OctoCoder and OctoGeeX performed best in HumanEvalPack, demonstrating CommitPack’s superiority in generalizing to a wider range of language and natural coding tasks. The study also provides free access to code, models, and data.

4.CausalLM is not optimal for in-context learning

https://www.aminer.cn/pub/64dafb293fda6d7f064e2cd5/

ChatPaper Review: The article shows that in context learning, using the prefix language model (prefixLM) can achieve better performance than the causal language model (causalLM), and this result is obtained empirically, but it is not clear theoretically . The authors adopt a theoretical approach to analyze the convergence behavior of prefix LMs and causal LMs constructed using specific parameters. The analysis results show that both LM types converge to their stable points at a linear speed, but the prefix LM converges to the optimal solution of linear regression, while the convergence dynamics of the causal LM follows the online gradient descent algorithm even if the number of samples grows infinitely. Convergence to the optimal solution is not guaranteed. The authors corroborate their theoretical claims through empirical experiments on synthetic and real tasks and using different types of transformers, showing that causal LM performs consistently poorly in all settings.

5.VisIT-Bench: A Benchmark for Vision-Language Instruction Following Inspired by Real-World Use

https://www.aminer.cn/pub/64dafb293fda6d7f064e2b97/

ChatPaper review: Introducing VisIT-Bench (Visual InsTruction Benchmark) - a benchmark for evaluating visual language models for real-world applications. By curating 70 "instruction families," the benchmark aims to evaluate the capabilities of instruction-tuned visual language models. The range of tasks spans from basic recognition to gaming and creative generation, extending evaluation methods such as VQAv2 and COCO. After curation, the data set contained 592 test queries, each with a title conditioned on human-written instructions. These describe factors relevant to the order, for example for an order asking whether an accessible store is suitable for wheelchair users, the title would describe the store's ramps/potential obstacles. These descriptions make it possible to collect human-validated reference outputs for each instance and perform automated evaluation of candidate multimodal generation using text-only LLMs (Language Models), consistent with human judgment. Through manual and automated evaluation, it is possible to quantify the quality gap between the model and the reference; for example, in the comparison, the best instruction following model only beat the GPT-4 reference 27% of the time. VisIT-Bench allows dynamic participation by practitioners who can submit responses for their models on the project website; data, code, and leaderboards are available at http://visit-bench.github.io. This paper illustrates the quality gap problem of visual language models when faced with real-world applications.

6.Jurassic World Remake: Bringing Ancient Fossils Back to Life via Zero-Shot Long Image-to-Image Translation

https://www.aminer.cn/pub/64dafb2f3fda6d7f064e349b/

ChatPaper Review: The article discusses how to use the guided diffusion model and the image editing model to achieve translation across large domain gaps in the zero-shot image-to-image translation task. The authors found that traditional image-to-image translation methods are not effective enough across large domain gaps, so they explored methods using guided diffusion and image editing models, and proposed a new baseline model Revive-2I that is able to achieve zero translation with text prompts. Sample image-to-image translation. The authors found that guidance and prompting are necessary in long translation tasks across large domain gaps because prior knowledge about the target domain is required to bridge large domain gaps. Furthermore, the authors found that hints provide the best and most scalable information about the target domain because classifier-guided diffusion models need to be retrained for specific use cases and because they are trained on a wide variety of images , so there is a lack of stronger constraints on the target domain.

7.Detecting and Preventing Hallucinations in Large Vision Language Models

https://www.aminer.cn/pub/64dafb293fda6d7f064e2acb/

ChatPaper Review: Pointed out the hallucination problem of large visual language models in generating detailed descriptions, including fictitious objects, inaccurate descriptions, and wrong relationships. To address this problem, the authors introduce a multimodal hallucination detection dataset called M-HalDetect for training and evaluating hallucination detection and prevention models. This dataset contains 16,000 fine-grained labeled visual question answering examples and is the first comprehensive multi-modal hallucination detection dataset for detailed image description. Unlike previous work that only considered object hallucinations, the authors also annotated inaccurate entity descriptions and relationships. The authors also proposed a fine-grained direct preference optimization method and trained a fine-grained multi-modal reward model and evaluated their effectiveness using an optimal rejection sampling method. Manual evaluation results show that using these methods can reduce the hallucination rate by 41% and 55% respectively, which is a significant improvement compared to the baseline method.

8.A Survey on Deep Neural Network Pruning-Taxonomy, Comparison, Analysis, and Recommendations

https://www.aminer.cn/pub/64dafb293fda6d7f064e2c44/

ChatPaper review: illustrates a problem in neural networks: the model size of modern deep neural networks is huge and requires a lot of computing and storage resources. This makes it difficult to deploy these models in resource-constrained environments and accelerate inference times. To solve this problem, researchers are increasingly exploring pruning techniques as a research direction for neural network compression. However, there is currently a lack of review papers that comprehensively evaluate the latest pruning methods. Therefore, this study provides a comprehensive review of existing deep neural network pruning research efforts to classify and compare pruning methods, and explores emerging topics and research directions. To facilitate future research, the researchers also provide a repository of datasets, networks, and evaluations for different applications, as well as valuable suggestions for choosing pruning methods and envisioning promising research directions.

9.ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate

https://www.aminer.cn/pub/64dafb293fda6d7f064e2e02/

ChatPaper Review: The paper describes the problems with existing text evaluation methods and how to solve them through the framework of multi-agent debate. Although current single-agent methods using large language models (LLMs) for text evaluation have some potential, experimental results show that further improvements are needed to narrow the gap between them and human evaluation quality. In order to bridge this gap, researchers adopted a multi-agent debate method, shifting from a single-agent prompting strategy to a multi-agent cooperative evaluation model. The multi-agent approach enables a group of LLMs to work collaboratively with a range of intelligent adversaries, leveraging their different capabilities and expertise to improve the efficiency and effectiveness of processing complex tasks. In this paper, researchers build a multi-agent referee team called ChatEval to autonomously discuss and evaluate the quality of answers generated by different models in open questions and traditional natural language generation (NLG) tasks. The findings show that ChatEval goes beyond text scoring and provides a reliable assessment method that simulates the human assessment process.

10.Large Language Models for Information Retrieval: A Survey

https://www.aminer.cn/pub/64dafb293fda6d7f064e2d9e/

ChatPaper Review: Mainly discusses the combination between large-scale language models and information retrieval systems, and highlights the rapid development of this field and some remaining problems. Challenges such as data scarcity, interpretability and generating contextually sound but potentially inaccurate responses were mentioned. The article also mentions the necessity of a combination of traditional methods (such as term-based sparse retrieval methods and fast responses) and modern neural architectures (such as language models with strong language understanding capabilities) to promote the development of IR systems. In addition, the article also introduces the revolutionary role of large-scale language models (such as ChatGPT and GPT-4) in the field of natural language processing, and emphasizes that recent research strives to use large-scale language models to improve IR systems. The entire paper aims to synthesize existing approaches and provide nuanced insights through a comprehensive overview.


How to use ChatPaper?
The method of using ChatPaper is very simple. Open the AMiner homepage and enter the ChatPaper page from the navigation bar at the top of the page or the lower right corner.

Insert image description here

In the ChatPaper page, you can choose to have a conversation based on a single document or a conversation based on the entire database (personal document database). You can choose to upload a local PDF or directly search for documents on AMiner.

ChatPaper usage tutorial: Click here to view

Guess you like

Origin blog.csdn.net/AI_Conf/article/details/132313957