Real-time tracking of scientific research trends丨7.20 Selected new papers, with ChatPaper summary

As a scientific researcher, you need to search and browse a large amount of academic literature every day to obtain the latest scientific and technological progress and research results. However, traditional retrieval and reading methods can no longer meet the needs of researchers.

ChatPaper, a document knowledge tool that integrates retrieval, reading, and knowledge question-and-answer. Help you quickly improve the efficiency of searching and reading papers, obtain the latest research trends in the field, and make scientific research work more easily.
insert image description here

Combined with the cutting-edge dynamic subscription function, select arXiv's popular new papers of the day to form a summary of papers, so that everyone can understand cutting-edge trends more quickly.

If you want to have an in-depth conversation on a certain paper, you can directly copy the paper link to your browser or go directly to the ChatPaper page:

ChatPaper entrance: https://www.aminer.cn/chat/g/

List of Featured New Papers for July 20, 2023:

1.On the Origin of LLMs: An Evolutionary Tree and Graph for 15,821 Large Language Models

Link: https://www.aminer.cn/pub/64b8b1bd3fda6d7f062b9845/

ChatPaper review: Research points to the issue that due to the widespread use of large language models (LLMs), the question of which LLM contexts, settings, training methods, and families are popular or trending becomes important. However, currently there is no comprehensive LLM index available. Therefore, this study addresses this issue by exploiting the systematic naming rules of Hugging Face LLMs, performing hierarchical clustering using n-grams and word frequency-inverse document frequency, and identifying the correlations between LLMs. The study also developed a public web application, called Constellation, for browsing and exploring maps of the 15,821 LLMs, with a variety of visualization tools to aid in understanding the data.

2.DNA-Rendering: A Diverse Neural Actor Repository for High-Fidelity Human-centric Rendering

Link: https://www.aminer.cn/pub/64b8b1c13fda6d7f062bb087/

ChatPaper review: Illustrates that current anthropocentric rendering datasets and benchmarks are relatively scarce in terms of diversity, and that this diversity is critical for rendering effectiveness. Existing datasets limit researchers to explore and evaluate a small number of rendering problems on current datasets, while practical applications require methods that can work robustly in different scenarios. To address this issue, the authors propose DNA-Rendering, a large-scale, high-fidelity repository of human performance data for neural actor rendering. The dataset contains more than 1500 human subjects, 5000 action sequences and 67.5 million frames. In addition, the authors provide rich resources for each subject, including 2D/3D human body keypoints, foreground masks, SMPLX models, clothing/accessory materials, multi-view images and videos. These resources improve the accuracy of current methods on downstream rendering tasks. In addition, the authors built a professional multi-view system to capture the data, which contains 60 simultaneous cameras with a maximum resolution of 4096 x 3000, a speed of 15 frames per second and strict camera calibration steps, which ensure high-quality resources for Task training and evaluation. In addition to the dataset, the authors provide a large-scale quantitative benchmark comprising multiple tasks to evaluate the progress of novel view synthesis, novel pose animation synthesis, and novel identity rendering methods. In conclusion, this study describes DNA-Rendering efforts, revealing new observations, challenges, and future directions for anthropocentric rendering.

3.Android in the Wild: A Large-Scale Dataset for Android Device Control

Link: https://www.aminer.cn/pub/64b8b1c13fda6d7f062bb007/

ChatPaper Review: The paper illustrates a growing interest in device control systems for interpreting human natural language commands and directly controlling the execution of their user interfaces on digital devices. The authors propose a dataset Android in the Wild (AITW) for device control research that is orders of magnitude larger than current datasets. This dataset contains human demonstrations of device interactions, including screens and actions, with corresponding natural language instructions. It contains 715k episodes covering 30k unique instructions, four Android versions (v10-13) and eight device types (from Pixel 2 XL to Pixel 6), with different screen resolutions. It incorporates multi-step tasks that require semantic understanding of language and visual environments. This dataset presents a new challenge: actions in user interfaces must be inferred from their visual appearance. Also, action spaces are not simple UI element-based actions, but precise gestures (for example, scrolling horizontally to manipulate a carousel widget). The authors organized the dataset to facilitate robustness analysis of device control systems, i.e. how well the system performs in the face of new task descriptions, new applications, or new platform versions. The authors develop two agents and report their performance on the entire dataset.

4.FABRIC: Personalizing Diffusion Models with Iterative Feedback

Link: https://www.aminer.cn/pub/64b8b1c13fda6d7f062bb077/

ChatPaper review: explores how human feedback can be integrated into the generative process of a diffuse text-to-image model in generative models. By exploiting self-attention layers in the most commonly used architectures to associate the diffusion process with a set of feedback images, we propose a training-free method applicable to various popular diffusion models, named FABRIC. To ensure rigorous evaluation of our method, we introduce a comprehensive evaluation methodology that provides a powerful mechanism to quantify the performance of generative vision models incorporating human feedback. Through an exhaustive analysis, we show that the generated results improve with multiple rounds of iterative feedback, thereby implicitly optimizing arbitrary user preferences. Potential application areas of these findings include personalized content creation and customization.

5.Text2Layer: Layered Image Generation using Latent Diffusion Model

Link: https://www.aminer.cn/pub/64b8b1bd3fda6d7f062b9835/

Overview of ChatPaper: In the existing image editing workflow, layer compositing is a very popular method. However, in existing methods, image generation and layer mask generation are performed separately. To improve this process and produce higher-quality layer synthesis results, the authors propose a new approach, layered image generation using a latent diffusion model. They reconstructed layered images by training an autoencoder and training a diffusion model on the latent representations to simultaneously generate background, foreground, layer masks, and composite images. Such an approach not only produces high-quality layered images, but also improves layer compositing workflows and provides higher-quality layer masks. Experimental results demonstrate that the proposed method is capable of producing high-quality layered images and provide a benchmark for future work.

6.DialogStudio: Towards Richest and Most Diverse Unified Dataset Collection for Conversational AI

Link: https://www.aminer.cn/pub/64b8b1c13fda6d7f062bb086/

ChatPaper review: pointed out the challenges faced by the current dialogue AI field: language models encounter difficulties in handling diverse dialogue tasks, and existing dialogue data sets lack diversity and comprehensiveness. To address these issues, the authors introduce DialogStudio, the largest and richest collection of dialog data unified in a consistent format while preserving its original information. This dataset includes data from open-domain dialogue, task-oriented dialogue, natural language understanding, recommendation dialogue, dialogue summarization, and knowledge-based dialogue, making it a very rich and diverse resource for dialogue research and model training. To further improve the utility of DialogStudio, the authors identify licenses for each dataset and design domain-aware cues for selected dialogs to facilitate guided awareness fine-tuning. In addition, the author uses this data set to develop a dialogue AI model, and the experiment demonstrates the superior performance of DialogStudio in zero-shot learning and few-shot learning scenarios.

7.Challenges and Applications of Large Language Models

Link: https://www.aminer.cn/pub/64b8b1c13fda6d7f062bb083/

ChatPaper review: The paper aims to address the challenges and problems that have been successfully applied in the field of large language models (LLMs). Due to the rapid development of this field, it is difficult to determine what challenges remain and application areas where results have been achieved. Therefore, the goal of this paper is to pose a series of systematic open questions and present successful cases in application, so that machine learning researchers can understand the state of the field more quickly and improve productivity.

8.Towards A Unified Agent with Foundation Models

Link: https://www.aminer.cn/pub/64b8b1bd3fda6d7f062b97b1/

ChatPaper review: The article illustrates that the capabilities of language models and visual language models can be embedded and exploited in reinforcement learning (RL) agents. These models demonstrate unprecedented capabilities in understanding human intent, reasoning, scene understanding, and planning behavior. The article explores the framework for using language as a core reasoning tool, and discusses how in this way agents can be enabled to solve a series of fundamental RL challenges, such as efficient exploration, reuse of empirical data, scheduling skills and learning from observation, which traditionally require separate Designed vertical algorithm. The authors tested their method in a simulated robotic manipulation environment with sparse rewards, where the robot was required to stack a set of objects. The results demonstrate significant performance improvements over baseline methods in terms of exploration efficiency and the ability to reuse data from offline datasets, and demonstrate how learned skills can be reused to solve new tasks or mimic videos of human experts .


How to use ChatPaper?

The method of using ChatPaper is very simple. Open the AMiner homepage and enter the ChatPaper page from the navigation bar at the top of the page or the lower right corner.
insert image description here
On the ChatPaper page, you can choose to have a dialogue based on a single document or a dialogue based on the entire library (personal library), and you can choose to upload a local PDF or directly search for documents on AMiner.

If you have any questions or suggestions, please feel free to contact us.

Guess you like

Origin blog.csdn.net/AI_Conf/article/details/131851575