Real-time tracking of scientific research trends 丨 Selected new papers on 7.24, with ChatPaper summary

As a scientific researcher, you need to search and browse a large amount of academic literature every day to obtain the latest scientific and technological progress and research results. However, traditional retrieval and reading methods can no longer meet the needs of researchers.

ChatPaper, a document knowledge tool that integrates retrieval, reading, and knowledge question-and-answer. Help you quickly improve the efficiency of searching and reading papers, obtain the latest research trends in the field, and make scientific research work more easily.
insert image description here

Combined with the cutting-edge dynamic subscription function, select arXiv's popular new papers of the day to form a summary of papers, so that everyone can understand cutting-edge trends more quickly.

If you want to have an in-depth dialogue on a certain paper, you can directly copy the link of the paper to your browser or go directly to the ChatPaper page: https://www.aminer.cn/chat/g/

List of featured new papers for July 24, 2023:

1.Artificial Intelligence for Science in Quantum, Atomistic, and Continuum Systems

https://www.aminer.cn/pub/64b60eaa3fda6d7f06eaec2c/

The paper points out some of the problems facing artificial intelligence in quantum, atomic and continuum science. One of the common challenges is how to capture first principles in physics, especially symmetry, by deep learning methods. The paper also discusses some other common technical challenges, including interpretability, generalization beyond distribution, knowledge transfer based on basic and large language models, and uncertainty quantification. In addition, the paper also provides a classified list of some learning and educational resources, aiming to promote further research and development in the field of AI4Science.

2.CopyRNeRF: Protecting the CopyRight of Neural Radiance Fields

https://www.aminer.cn/pub/64bdf76d3fda6d7f06fbcf79/

The paper discusses the issue of protecting the copyright of Neural Radiation Field (NeRF) models. NeRF is an important media representation method, but it is not easy to train NeRF, so it is important to protect its model copyright. By analyzing the pros and cons of possible copyright protection schemes, this paper proposes to protect the copyright of NeRF models by replacing the original color representation in NeRF with a watermarked color representation. Then, an anti-aliasing rendering scheme is designed to ensure the stable extraction of watermark information in NeRF's 2D rendering. Compared with other alternatives, our method can directly protect the copyright of NeRF models, while maintaining high rendering quality and bit accuracy.

3.FaceCLIPNeRF: Text-driven 3D Face Manipulation using Deformable Neural Radiance Fields

https://www.aminer.cn/pub/64bdf76d3fda6d7f06fbcf0c/

Existing face manipulation methods require a lot of manual labor, such as user-supplied semantic masks and manual attribute searches, which are not suitable for non-professional users. To address this issue, the researchers proposed a method for manipulating faces reconstructed using NeRF that only requires a piece of text. They first trained a scene manipulator, a deformable NeRF conditioned on latent codes, to control the deformation of faces. However, representing scene deformations with a single latent code is disadvantageous for combining local deformations observed in different instances. Therefore, the researchers propose a method called Position-conditional Anchor Compositor (PAC), which learns to represent manipulated scenes with spatially varying latent codes. After they are rendered with scene manipulators, text-driven manipulation is achieved by optimizing them to have high cosine similarity with the target text in the CLIP embedding space. To the best of our knowledge, this is the first approach that addresses text-driven manipulation of faces reconstructed using NeRF. Extensive results, comparisons, and ablation studies demonstrate the effectiveness of our method.

4.STEVE-1: A Generative Model for Text-to-Behavior in Minecraft

https://www.aminer.cn/pub/64796919d68f896efa134e12/

The paper proposes a text-to-action generative model for Minecraft called STEVE-1. The model is trained in two steps: tuning the pretrained VPT model to follow commands in MineCLIP's latent space, and then training a prior model to predict the latent code from the text. By leveraging pre-trained models and adopting best practices for text-conditioned image generation, STEVE-1 can be trained for as little as $60 and is able to follow a wide range of short-term open-ended text and visual instructions in Minecraft. STEVE-1 far exceeds previous benchmark results using low-level controls (mouse and keyboard) and raw pixel input, implementing a new standard for open instruction following in Minecraft. The paper provides experimental evidence highlighting key factors affecting downstream performance, including pre-training, classifier-free bootstrapping, and data scaling. All resources, including model weights, training scripts, and evaluation tools are available for further research.

5.Diffusion Sampling with Momentum for Mitigating Divergence Artifacts

https://www.aminer.cn/pub/64bdf76d3fda6d7f06fbcdd7/

Slow sampling of diffusion models remains a persistent problem in image generation. To speed up the sampling process, previous studies redefined diffusion sampling as ODE/SDE and introduced high-order numerical methods. However, these methods often produce divergence artifacts, especially when the sampling steps are small, which limits the achievable acceleration. This paper investigates potential causes of these artifacts and proposes the idea that small stable regions may be the main cause. To address this issue, we propose two new techniques. The first technique is to incorporate the Heavy Ball (HB) momentum method, a known technique for improved optimization, into existing numerical methods for diffusion in order to extend their stability region. We also demonstrate that the resulting method has first-order convergence. The second technique, called Generalized Heavy Ball (GHVB), constructs a new higher-order method that offers a variable trade-off between accuracy and artifact suppression. Experimental results show that our technique is highly effective in reducing artifacts and improving image quality, outperforming state-of-the-art diffusion solvers on low-order sampled pixels and latent diffusion models. Our study provides new insights into the design of numerical methods for future diffusion work.

6.Zero-touch realization of Pervasive Artificial Intelligence-as-a-service in 6G networks

https://www.aminer.cn/pub/63f0088390e50fcafdeb8e17/

The paper points out the problem of realizing touchless perception artificial intelligence (PAI) as a service in 6G network. The current 6G technology pursues ultra-dense networks, low latency, and high-speed data transmission, and aims to achieve self-services such as self-configuration, self-monitoring, and self-healing through zero-touch solutions. However, research on 6G is still in its infancy, only starting to conceptualize designs, study implementation and plan application cases. To this end, academia and industry are gradually shifting from theoretical research on AI distribution to practical deployment and standardization. However, the end-to-end framework design that simplifies AI distribution and provides more convenient access to services by assisting third-party applications through zero-touch service configuration has not been deeply explored. In this context, the authors introduce a novel platform architecture to deploy zero-touch PAI as a service (PAaaS) in 6G networks through a blockchain-based intelligent system. The platform aims to standardize PAI at all levels of the architecture and unify interfaces in order to facilitate service deployment across application and infrastructure domains, alleviating user concerns about cost, security and resource allocation, while respecting the performance-critical requirements of 6G . As a proof-of-concept, the authors present a federated learning-as-a-service application case to evaluate the ability of the proposed system to self-optimize and adapt to 6G network dynamics while minimizing the perceived cost to users.

7.AIGC Empowering Telecom Sector White Paper

https://www.aminer.cn/pub/64bdf76d3fda6d7f06fbcf2b/

1. As a transformational technology and an important force for economic and social development, AI will bring huge leaps and breakthroughs to the global industry and profoundly affect the future competition pattern. 2. As the builder and operator of information and communication infrastructure, the telecommunications industry provides the foundational support for the development of AI and is in a leading position in the implementation of AI applications. 3. How to realize the application of AIGC (GPT) and implement AIGC in the field of telecommunications is a question that must be considered and answered by telecom practitioners. 4. Through the research on AIGC, the authors analyzed how GPT empowers the telecommunications industry in the form of scenarios, discussed the gap between the current GPT general model and telecommunications services, and proposed the Telecommunications Enhanced Cognitive Ability System for the first time. GPT, which builds telecommunication services in the field of telecommunication, provides the answer and conducts various practices. 5. It is expected that relevant parties in the industry will pay attention to the collaborative innovation around telecommunications and AI, establish an open and shared innovation ecosystem, promote the deep integration of AI and the telecommunications industry, accelerate the construction of next-generation information infrastructure, and facilitate the digital transformation of the economy and society .

8.Predict, Refine, Synthesize: Self-Guiding Diffusion Models for Probabilistic Time Series Forecasting

https://www.aminer.cn/pub/64bdf76d3fda6d7f06fbcf58/

The paper illustrates that in time series tasks, previous diffusion models have mainly focused on developing conditional models for specific forecasting or filling tasks. However, the authors explore the potential of unconditional diffusion models in several time series applications by proposing a task-oriented unconditional diffusion model, TSDiff. Through a self-bootstrap mechanism, TSDiff can perform conditional computation during inference without using an auxiliary network or changing the training process. The authors demonstrate the effectiveness of the method on three different time series tasks: forecasting, optimization, and generating synthetic data. First, they demonstrate that TSDiff is competitive with several task-specific conditional prediction methods. Second, they use the implicit probability density learned by TSDiff to iteratively optimize the prediction results of the base predictor and reduce the computational overhead of backdiffusion. Notably, the generative performance of the model remains intact, with subsequent predictors trained on synthetic samples generated by TSDiff outperforming predictors trained on samples from other state-of-the-art generative time series models, and sometimes even on real data model trained on.

9.Robust Visual Question Answering: Datasets, Methods, and Future Challenges

https://www.aminer.cn/pub/64bdf76d3fda6d7f06fbcf41/

The article mainly discusses the robustness of visual question answering. Existing general-purpose VQA methods often tend to memorize biases present in the training data rather than learning correct behaviors, such as accurate interpretation of images before predicting answers. Therefore, these methods usually perform well inside the data, but poorly outside it. To evaluate and enhance the robustness of VQA, various datasets and debiasing methods have been proposed in recent years. This article provides the first comprehensive survey focused on this emerging field. Specifically, the development process of the dataset is first outlined from both data-inside and data-out perspectives. We then examine the evaluation metrics used on these datasets. Third, we propose a taxonomy and introduce the development process, similarities and differences, robustness comparisons, and technical characteristics of existing debiasing methods. In addition, we analyze and discuss the robustness of representative vision and language pre-trained models on VQA. Finally, through a thorough review of existing literature and experimental analysis, we discuss key areas for future research from various perspectives.

10.BoxDiff: Text-to-Image Synthesis with Training-Free Box-Constrained Diffusion

https://www.aminer.cn/pub/64ba03413fda6d7f06273364/

The paper points out a problem in text-to-image synthesis: Existing methods mainly study the way of synthesizing images using only text cues, while few studies use other forms of conditions, such as boxes or sketches. However, the time required to obtain box/mask image pairing data and fine-tune is time-consuming and laborious, and is limited to closed sets. Since the acquisition of these paired data is time-consuming and labor-intensive, and constrained by closed sets, applying these methods in the open world may become a bottleneck. This paper proposes a training-free method to control objects and backgrounds in synthetic images by given spatial conditions. Specifically, three spatial constraints are designed and seamlessly integrated into the denoising step of the diffusion model, without the need for additional training and extensive annotated layout data. Experimental results show that the proposed constraints can control the content and position in the image, and still maintain the ability of the Stable Diffusion model to synthesize high-fidelity and diverse concept coverage.


How to use ChatPaper?

The method of using ChatPaper is very simple. Open the AMiner homepage and enter the ChatPaper page from the navigation bar at the top of the page or the lower right corner.
insert image description here

On the ChatPaper page, you can choose to have a dialogue based on a single document or a dialogue based on the entire library (personal library), and you can choose to upload a local PDF or directly search for documents on AMiner.

Guess you like

Origin blog.csdn.net/AI_Conf/article/details/131912042