Real-time tracking of scientific research trends丨7.25 Selected new papers, with ChatPaper summary

As a scientific researcher, you need to search and browse a large amount of academic literature every day to obtain the latest scientific and technological progress and research results. However, traditional retrieval and reading methods can no longer meet the needs of researchers.

ChatPaper, a document knowledge tool that integrates retrieval, reading, and knowledge question-and-answer. Help you quickly improve the efficiency of searching and reading papers, obtain the latest research trends in the field, and make scientific research work more easily.
insert image description here

Combined with the cutting-edge dynamic subscription function, select arXiv's popular new papers of the day to form a summary of papers, so that everyone can understand cutting-edge trends more quickly.

If you want to have an in-depth dialogue on a certain paper, you can directly copy the link of the paper to your browser or go directly to the ChatPaper page: https://www.aminer.cn/chat/g/

List of Featured New Papers for July 25, 2023:

1. Evaluating the Ripple Effects of Knowledge Editing in Language Models paper details page

https://www.aminer.cn/pub/64bf49b13fda6d7f062822c1/

Explains the problem that errors will occur when editing knowledge in the language model. Existing editing methods mainly focus on whether an individual fact is successfully injected and whether similar predictions for other subjects change. However, the abstract argues that there are limitations to this approach to evaluation, as injecting one fact can cause a "ripple effect" in which the model needs to be updated with other relevant facts. To address this issue, Abstract proposes a new evaluation criterion that considers the impact of edits on relevant facts. From these criteria, the abstract constructs a diagnostic benchmark "ripple" of 5K actual edits, capturing multiple types of ripple effects. Abstract Evaluation of well-known editing methods on "ripple" shows that current methods cannot introduce consistent changes to the model's knowledge. Furthermore, the abstract finds that a simple contextual editing baseline achieves the best score on our benchmarks, suggesting that model editing is a promising research direction.

2.3D-LLM: Injecting the 3D World into Large Language Models paper details page

https://www.aminer.cn/pub/64bf49b63fda6d7f062827a7/

The paper raises the issue that current large language models (LLMs) and visual language models (VLMs) have no roots in the 3D physical world that includes richer concepts of spatial relationships, applicability, physics, layout, etc. The authors address this issue by proposing a new model of 3D-LLMs that aims to introduce the three-dimensional world into large language models. This model can accept 3D point cloud and its features as input, and perform a variety of 3D related tasks, including description, dense description, 3D question answering, task decomposition, 3D positioning, 3D assisted dialogue, navigation, etc. By using the designed three prompt mechanisms, the authors were able to collect more than 300,000 3D language data covering these tasks. To efficiently train the 3D-LLMs model, the authors first utilize a 3D feature extractor to obtain 3D features from rendered multi-view images, and then use the 2D VLMs model as the backbone to train the 3D-LLMs model. After introducing the 3D positioning mechanism, the 3D-LLMs model can better capture the 3D spatial information. Experiments on the ScanQA dataset show that our model achieves better performance when compared to baseline models (e.g., BLEU-1 score exceeds the existing state-of-the-art score by 9%). Furthermore, experiments on 3D description, task composition and 3D assisted dialogue show that our model outperforms 2D VLMs models. Qualitative examples also show that our model can perform many more tasks beyond the scope of existing LLMs and VLMs.

3. RLCD: Reinforcement Learning from Contrast Distillation for Language Model Alignment paper details page

https://www.aminer.cn/pub/64bf49a33fda6d7f0628086a/

The paper proposes a method called RLCD, which is used to align language models according to the principles of natural language through contrast distillation without using human feedback. RLCD trains a preference model by using simulated preference pairs generated by comparing positive and negative examples, and then uses reinforcement learning to improve a basic misaligned language model. Experiments demonstrate that RLCD outperforms RLAIF (Bai et al., 2022b) and A baseline method for contextual distillation (Huang et al., 2022).

4. A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis paper details page

https://www.aminer.cn/pub/64bf49013fda6d7f06275319/

The paper points out that the following problems still exist when using pre-trained large language model (LLM) for autonomous network navigation on real-world websites: (1) open domain problem, (2) limited context length, (3) HTML Lack of inductive bias. To address these issues, the researchers introduce WebAgent, an LLM-driven agent that can complete tasks on real websites based on natural language instructions. WebAgent plans ahead by breaking down instructions into canonical sub-instructions, summarizing long HTML documents into task-relevant fragments, and executing tasks on websites through generated Python programs. The researchers design Flan-U-PaLM for code-based generation, and a new pretrained LLM HTML-T5 for planning and summarization, using local and global attention mechanisms and hybrid long-span denoising objectives. Empirical results show that their method improves the task success rate on real websites by more than 50%, and that HTML-T5 is the best model for solving HTML-based tasks; compared with the previous state-of-the-art on the MiniWoB webpage navigation benchmark, 14.9% higher success rate and better accuracy in offline mission planning evaluation. Thus, the summary illustrates the remaining problems of task completion on real websites.

5. WOUAF: Weight Modulation for User Attribution and Fingerprinting in Text-to-Image Diffusion Models paper details page

https://www.aminer.cn/pub/6482a38ed68f896efa8db3a0/

The paper points to the rapid development of generative models that can create hyper-realistic images from textual descriptions, but also raises concerns about disinformation. Traditional fake detection mechanisms, while providing mitigation to some extent, fall short in holding accountable for the malicious use of synthetic images. The paper proposes a novel model fingerprinting technique to attribute generated images as a potential countermeasure against model misuse. This method modifies the generative model according to each user's unique digital fingerprint, and imprints the unique identifier on the generated content, which can be traced back to the user. This method introduces fine-tuning to the text-to-image (T2I) task, using a stable diffusion model, and achieves near-perfect attribution accuracy with little impact on output quality. This paper rigorously examines the secrecy of our method under two different scenarios: one where a malicious user attempts to detect fingerprints, and the other where the user has comprehensive knowledge of our method. The robustness of our method to various image post-processing operations commonly performed by users is also evaluated. Through extensive evaluation of stable diffusion models, our approach provides a promising and novel avenue for traceable model distribution and responsible use.

6.Optimized Network Architectures for Large Language Model Training with Billions of Parameters paper details page

https://www.aminer.cn/pub/64bf48f93fda6d7f0627475c/

The paper points out the problems in building any-to-any networks for training large-scale language models (LLMs). Traditionally, all GPUs require high-bandwidth any-to-any communication for near-optimal training performance. However, this paper finds that the communication pattern of LLMs is unique, requiring only high-bandwidth any-to-any communication between small groups of GPUs, while communications outside these groups are trivial, sparse, and evenly distributed. To address this issue, the authors propose a new network architecture that divides the cluster into a collection of GPUs connected by a non-blocking any-to-any high-bandwidth interconnect, called an HB domain. Between HB domains, the network will only connect GPUs with communication needs. The authors refer to this network connectivity as "rail-only" connectivity, and show that our proposed network architecture can reduce network costs by up to 75% compared to existing any-to-any Clos networks, without compromising LLM training performance.

7. Question Decomposition Improves the Faithfulness of Model-Generated Reasoning paper details page

https://www.aminer.cn/pub/64bf48f93fda6d7f062745ba/

The problem of verifying the correctness and safety of the behavior of large language models (LLMs) becomes more difficult as they perform more difficult tasks. One way to address this problem is by forcing LLMs to generate step-by-step reasoning (CoT) while answering questions, such that they externalize the reasoning process. The inference process allows us to examine the process a model uses to perform a task. However, this approach relies on the stated reasoning to faithfully reflect the model's actual reasoning, which is not always the case. To improve the fidelity of CoT inference, we enable model generative inference by decomposing the problem into sub-problems. Decomposition-based methods achieve strong performance on question answering tasks, sometimes approaching CoT performance, while improving model statement inference accuracy on some recently proposed metrics. By forcing the model to answer simpler subquestions in different contexts, we greatly increase the fidelity of model generative inference with respect to CoT, while still achieving partial CoT performance gains. Our results show that the fidelity of model-generating inferences can be improved; further improvements may lead to inferences that can verify the correctness and safety of LLM behavior.

8.Less is More: Focus Attention for Efficient DETR paper details page

https://www.aminer.cn/pub/64bf48f93fda6d7f06274926/

A problem in object detection models is studied, that is, all tokens are treated equally in traditional encoder structures, which introduces a redundant computational burden. Recent sparsification strategies exploit a subset of informative labels to reduce attention complexity, maintaining performance through sparse encoders. However, these methods often rely on unreliable model statistics, and simply reducing the number of markers can greatly limit the detection performance, limiting the application of these sparse models. The study proposes a method called Focus-DETR, which achieves a better balance between computational efficiency and model accuracy by focusing on more informative markers. Specifically, we reconstruct the encoder by using dual attention, which includes a marker scoring mechanism that considers the localization and category semantic information of objects from multi-scale feature maps. The researchers effectively discard background queries and enhance the semantic interaction of fine-grained object queries based on scores. Compared with state-of-the-art sparse DETR-like detectors under the same settings, our Focus-DETR achieves 50.4AP (+2.2) on COCO dataset with comparable complexity.

9.Is attention all you need in medical image analysis? A review paper details page

https://www.aminer.cn/pub/64bf49013fda6d7f062752c7/

A question in medical image analysis is explored: Is light enough for attention? It points out that the current common CNN models ignore the global pixel relationship in the image, which limits their ability to "generalize" to different global information. In recent years, with the advancement of artificial intelligence, Transformer models that can learn global relationships from data have emerged. However, a complete Transformer model needs to be trained on large-scale data and involves enormous computational complexity. Therefore, a lightweight attention and Transformer component (Transf/Attention) is proposed as a replacement for the full Transformer. Recently, there has been an increasing trend of fusion models between CNN and Transf/Attention architectures, enabling a new era of hybrid models. This study provides an overview of existing hybrid CNN-Transf/Attention models, evaluates current and future opportunities and challenges, and introduces a comprehensive analytical framework for exploring scientific and clinical generalization opportunities that can inspire new data Drives research on domain generalization and adaptation methods.

Guess you like

Origin blog.csdn.net/AI_Conf/article/details/131933856