Real-time tracking of scientific research trends丨7.21 Selected new papers, with ChatPaper summary

As a scientific researcher, you need to search and browse a large amount of academic literature every day to obtain the latest scientific and technological progress and research results. However, traditional retrieval and reading methods can no longer meet the needs of researchers.

ChatPaper, a document knowledge tool that integrates retrieval, reading, and knowledge question-and-answer. Help you quickly improve the efficiency of searching and reading papers, obtain the latest research trends in the field, and make scientific research work more easily.
insert image description here

Combined with the cutting-edge dynamic subscription function, select arXiv's popular new papers of the day to form a summary of papers, so that everyone can understand cutting-edge trends more quickly.

If you want to have an in-depth dialogue on a certain paper, you can directly copy the link of the paper to your browser or go directly to the ChatPaper page: https://www.aminer.cn/chat/g/

List of Featured New Papers for July 21, 2023:

1. A Survey on Dialogue Management in Human-Robot Interaction paper details page

Link: https://www.aminer.cn/pub/64ba03413fda6d7f062733bd/

ChatPaper review: Discusses the issue of dialog management in human-computer interaction. As social robots are increasingly deployed among the public, improving interactions with these robots is critical. Spoken language provides an intuitive interface for human-computer interaction, and dialogue management is a key component of these interactive systems. However, to overcome current challenges and achieve fluent, rich, and engaging interactions, a more structured approach to combine human-computer interaction and dialog management is required. In this systematic review, we analyze the current state of the art of dialogue management in human-computer interaction, focusing on the types of dialogue managers used, their capabilities, evaluation methods, and issues specific to the challenges of dialogue management in human-computer interaction. We identify challenges and current scientific frontiers related to dialogue management methods, interaction domains, robot appearance, physical context, and multimodality.

2. Human Motion Generation: A Survey paper details page

Link: https://www.aminer.cn/pub/64ba03413fda6d7f062733ba/

ChatPaper review: The article mainly reviews the research in the field of human motion generation, explaining the research goals, progress and challenges in this field. The article mentions that human motion generation aims to generate natural human pose sequences and shows great potential in practical applications. In recent years, significant progress has been made in motion data collection techniques and generation methods, leading to interest in human motion generation. However, this task remains challenging due to the complexity of human locomotion and the implicit relationship to conditional signals. The article introduces the background of human motion and generative models, and reviews representative approaches for three mainstream subtasks: human motion generation from text, audio, and scene context. In addition, the article provides an overview of common datasets and evaluation metrics, and discusses open problems and potential future research directions. It is hoped that this review will provide the research community with a comprehensive understanding of this rapidly developing field and stimulate new ideas to address unresolved problems.

3. FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Sets paper details page

Link: https://www.aminer.cn/pub/64ba03413fda6d7f062733dd/

ChatPaper review: The paper illustrates the challenges of evaluating large language models (LLMs), namely fine-grained language model evaluation based on aligned skill sets. Current evaluation methods are usually coarse-grained evaluations that fail to take into account the nature of user instructions that require instance-by-instance skill combinations, thereby limiting the interpretation of the true capabilities of LLMs. To address this issue, the authors propose the FLASK evaluation protocol, which can be used for both model-based and human-based evaluation, and decomposes coarse-grained scoring into instance-by-instance skill set levels. With FLASK, the authors compared multiple open source and proprietary LLMs and observed a high correlation between model evaluation and human evaluation. FLASK enables developers to more accurately measure model performance and improve models by analyzing factors that make LLMs proficient at specific skills. For practitioners, FLASK can recommend a model suitable for a specific situation by comprehensively comparing various LLMs.

4. SciBench: Evaluating College-Level Scientific Problem-Solving Abilities of Large Language Models paper details page

Link: https://www.aminer.cn/pub/64ba03413fda6d7f062732a8/

ChatPaper review: Points out the insufficient capabilities of current large-scale language models (LLMs) in solving complex scientific problems. Current large-scale language models have achieved remarkable progress on mathematical benchmarks, but most of these benchmarks only involve middle and high school subjects, only contain multiple-choice questions, and are limited to the scope of basic arithmetic operations. To address these issues, this paper introduces SciBench, an extended benchmark suite designed to systematically study the reasoning capabilities required for complex scientific problem solving. SciBench contains two curated datasets: an open set containing college-level science questions from mathematics, chemistry, and physics textbooks, and a closed set containing questions from undergraduate exams in computer science and mathematics. Through the benchmark study on these two datasets, the results show that the current LLMs only achieve an unsatisfactory performance of 35.80% on the overall score. Furthermore, through detailed user studies, the researchers miscategorized LLMs into ten problem-solving competencies. The results of the analysis showed that no single prompting strategy significantly outperformed the others, and that some strategies that demonstrated improvements in specific problem-solving abilities resulted in declines in other abilities. The paper hopes that SciBench can promote the further development of LLMs in reasoning ability, thus ultimately contributing to scientific research and discovery.

5. The Role of Entropy and Reconstruction in Multi-View Self-Supervised Learning paper details page

Link: https://www.aminer.cn/pub/64ba03413fda6d7f062733c7/

ChatPaper review: Paper explaining the success mechanism of multi-view unsupervised learning (MVSSL) is not yet fully understood. The mutual information (MI) lower bound by InfoNCE has been studied by contrasting MVSSL methods. However, the relationship between other MVSSL methods and MI remains unclear. The authors consider a different lower bound for MI consisting of entropy and reconstruction term (ER), and analyze the main MVSSL method by this lower bound. With this ER lower bound, we show that cluster-based methods such as DeepCluster and SwAV maximize MI. The authors also reinterpret the mechanics of distillation-based methods such as BYOL and DINO, showing that they explicitly maximize the reconstruction term and implicitly encourage stable entropy, and confirm this experimentally. The authors show that substituting ER lower bounds for the common MVSSL approach's goals can achieve competitive performance while making them more stable when training with smaller batch sizes or smaller exponential moving average (EMA) coefficients. Additionally, the author provides a link to a related Github repository.

6. PASTA: Pretrained Action-State Transformer Agents paper details page

Link: https://www.aminer.cn/pub/64ba03413fda6d7f062733e5/

ChatPaper review: Addresses issues in existing approaches to using pretrained transformer models in reinforcement learning. Most existing methods rely on complex pre-training objectives tailored for specific downstream applications, limiting their applicability to a wide range of tasks. The study addresses this issue by proposing a model called PASTA and examining it comprehensively. A unified approach is adopted in the study and covers a wide range of downstream tasks, including behavior cloning, offline reinforcement learning, sensor failure robustness, and dynamics change adaptation. The goal of this study is to systematically compare various design choices and provide practitioners with valuable insights to construct robust models. Research highlights include tokenization at the action and state component level, using basic pre-training objectives (such as next token prediction), training models across multiple domains simultaneously, and using Parameter Efficient Fine-tuning (PEFT). The models developed in this study contained less than 10 million parameters, and applying PEFT enabled fine-tuning of less than 10,000 parameters during downstream adaptation, making these models accessible to a broad population and reproducing experimental results. We hope this study will encourage further research in representing RL trajectories using transformers chosen based on first-principles design and contribute to robust policy learning.

7. Meta-Transformer: A Unified Framework for Multimodal Learning paper details page

Link: https://www.aminer.cn/pub/64ba03413fda6d7f06273356/

ChatPaper review: The paper discusses a difficult problem in multimodal learning, that is, how to design a unified network model to process information from multiple modalities. Due to the inherent gap between these modalities, it is difficult to design a network model that can handle various modalities simultaneously. To address this issue, the authors propose a framework called Meta-Transformer, which utilizes a frozen encoder for multimodal perception without paired multimodal training data. In Meta-Transformer, raw input data from various modalities is mapped into a shared token space, enabling subsequent encoders to extract high-level semantic features of the input data. Meta-Transformer consists of three main components: a unified data tokenizer, a modality-shared encoder, and a specific head for downstream tasks. Experimental results show that Meta-Transformer can handle a variety of tasks, including basic perception (text, image, point cloud, audio, video), practical applications (X-ray, infrared, hyperspectral, and IMU), and data mining (graphs, tables, and sequentially). Meta-Transformer points to a promising future for unified multimodal intelligence development using Transformer.

8. Brain2Music: Reconstructing Music from Human Brain Activity paper details page

Link: https://www.aminer.cn/pub/64ba03413fda6d7f0627347c/

ChatPaper review: Paper presents a method for reconstructing music from human brain activity. The researchers captured human brain activity using functional magnetic resonance imaging (fMRI) and reconstructed the music using the music retrieval or MusicLM music generative model. The study found that the music generated by this method was similar to the musical stimuli experienced by people in terms of semantic properties such as musical style, instrumentation and mood. The researchers also explored the relationship between different components of MusicLM and brain activity through voxel-based encoding modeling analysis. In addition, the paper discusses which brain regions represent information about musical stimuli described in plain text. The paper provides supplementary material, including examples of reconstructed music.

9. TokenFlow: Consistent Diffusion Features for Consistent Video Editing paper details page

Link: https://www.aminer.cn/pub/64ba03413fda6d7f06273194/

ChatPaper review: Research points out that current video generative models still lag behind image models in terms of visual quality and user control over generated content. The authors propose a framework that leverages the power of text-to-image diffusion models for text-driven video editing tasks. Specifically, given a source video and a target text cue, the method generates a high-quality video that follows the target text while preserving the spatial layout and motion of the input video. The authors observe that consistency in edited videos can be achieved by enforcing consistency in the diffuse feature space. The authors achieve this by explicitly propagating diffusion features by exploiting inter-frame correspondences already in the model. Therefore, this framework does not require any training or fine-tuning, and can be used with any off-the-shelf text-to-image editing method. The authors demonstrate state-of-the-art editing results on a variety of real-world videos.

10.Large language models shape and are shaped by society: A survey of arXiv publication patterns paper details page

Link: https://www.aminer.cn/pub/64ba03413fda6d7f062732eb/

ChatPaper review: The main question of the article is about how large-scale language model (LLM) research has a profound impact in the academic field and how it is shaped by social factors. The authors focused on changes in publishing patterns between 2018-2022 and 2023 by analyzing 388,000 papers published on CS and Stat arXiv. The authors analyze the increase in the proportion of LLM papers, the degree of attention received by LLM-related topics, the correlation between the authors who write LLM papers and their research background and topics, factors that distinguish highly cited LLM papers, and patterns of international collaboration patterns . The authors note that LLM research is increasingly focused on social impact: on the "Computers and Society" subarXiv, the proportion of LLM-related papers increased 18-fold, and authors of newly published LLM papers were more concerned with applications and social impact than experienced authors. LLM research is also influenced by social dynamics: the authors document gender and academic/industrial gaps in the topics that LLM authors focus on, as well as U.S./Chinese splits in collaborative networks. Overall, the authors' analysis demonstrates the profound ways in which LLM research shapes and is shaped by society, illustrating the need for a sociotechnical perspective.

11.A Survey of What to Share in Federated Learning: Perspectives on Model Utility, Privacy Leakage, and Communication Efficiency 论文详情页

Link: https://www.aminer.cn/pub/64ba03413fda6d7f062732bc/

ChatPaper review: The paper discusses what information should be shared in federated learning, focusing on model utility, privacy leakage, and communication efficiency. Most current investigations on federated learning focus on methods for sharing model parameters during training, while ignoring the potential for sharing other forms of local information. This paper differs from previous papers through four distinct contributions. First, the FL method is classified into a new category through the sharing method, including three ways of sharing information: model sharing, synthetic data sharing and knowledge sharing. Second, the vulnerability of different sharing methods to privacy attacks is analyzed, and defense mechanisms that provide certain privacy guarantees are reviewed. Third, the performance and communication overhead of different sharing methods in FL are compared, and the potential privacy leakage is evaluated through model inversion and membership inference attacks, while the effectiveness of various defense methods is compared. Finally, potential shortcomings of current methods are discussed and future directions for improvement are suggested.


How to use ChatPaper?

The method of using ChatPaper is very simple. Open the AMiner homepage and enter the ChatPaper page from the navigation bar at the top of the page or the lower right corner.
insert image description here

On the ChatPaper page, you can choose to have a dialogue based on a single document or a dialogue based on the entire library (personal library), and you can choose to upload a local PDF or directly search for documents on AMiner.

Guess you like

Origin blog.csdn.net/AI_Conf/article/details/131892713