60% of 2023 has passed! AI has continuously demonstrated amazing capabilities in the past time. The popular papers in July have been updated. Compared with the influence of the papers before, this time, we pay more attention to what technology brings us?
First of all, the open source of Llama 2 has attracted the attention of the large model world, and it is free and commercially available, including three parameter variants of 7 billion, 13 billion and 70 billion, which are optimized for dialogue use cases. As a powerful language model, the open source version of Llama 2 has demonstrated its strength in multiple application scenarios due to its excellent performance and flexibility.
The team of Professor Sun Maosong from Tsinghua University studied the formation of multiple large model Agents into a group to operate a virtual technology company for collaborative software development. This is a new concept, AI provides imagination, we have reason to expect this trend will be more widely used in the future.
The LONGNET proposed by Microsoft can expand the scale of the Transformer model to 1 billion Tokens. This means that the Transformer model can process longer text sequences and thus achieve better results in more natural language processing tasks.
Here we show the most representative 17 popular papers. If you want to get all the papers, please click the link at the end of the article.
1.Llama 2: Open Foundation and Fine-Tuned Chat Models
Meta has open-sourced a free and commercially available version of Llama 2, with three parameter variants of 7 billion, 13 billion, and 70 billion, optimized for conversational use cases.
2.Artificial Intelligence for Science in Quantum, Atomistic, and Continuum Systems
A review of AI for Science by 63 scholars from 4 institutions. The paper points out some problems faced by artificial intelligence in quantum, atomic and continuous system science, discusses some other common technical challenges, and provides some learning and educational resources A taxonomic list for promoting further research and development in the field of AI for Science.
3.Meta-Transformer: A Unified Framework for Multimodal Learning
The authors propose a framework called Meta-Transformer that utilizes a frozen encoder for multimodal perception without paired multimodal training data. point to a promising future for the development of unified multimodal intelligence using Transformers.
4.Optimized Network Architectures for Large Language Model Training with Billions of Parameters
The authors find that the communication pattern of LLMs is unique, requiring only high-bandwidth any-to-any communication between small groups of GPUs, while communications outside these groups are trivial, sparse, and evenly distributed. To address this issue, the authors propose a new network architecture that divides the cluster into a collection of GPUs connected by a non-blocking any-to-any high-bandwidth interconnect, called an HB domain. The network cost can be reduced by up to 75% without compromising the performance of LLM training.
5.TokenFlow: Consistent Diffusion Features for Consistent Video Editing
Given a source video and a target text cue to generate a high-quality video, the authors propose a framework that leverages the power of text-to-image diffusion models for text-driven video editing tasks.
6.Communicative Agents for Software Development
The team of Professor Sun Maosong from Tsinghua University recently studied how to make multiple large model Agents form a group to operate a virtual technology company (ChatDev) for collaborative software development. Given only one natural language requirement, ChatDev can help users generate software fully automatically.
7. Retentive Network: A Successor to Transformer for Large Language Models
The paper proposes a RetNet network architecture for building large-scale language models, while achieving training parallelism, low-cost reasoning, and good performance.
8.DreamTeacher: Pretraining Image Backbones with Deep Generative Models
This work introduces DreamTeacher, a self-supervised feature representation learning framework, utilizing generative networks to pre-train downstream image backbones.
9.In-context Autoencoder for Context Compression in a Large Language Model
Introduces a model named In-context Autoencoder (ICAE) for context compression in large language models.
10. A Survey on Graph Neural Networks for Time Series: Forecasting, Classification, Imputation, and Anomaly Detection
A comprehensive overview of GNN for Time Series, including time series forecasting, classification, anomaly detection, and missing data completion tasks.
11.CAME: Confidence-guided Adaptive Memory Efficient Optimization
ACL2023 outstanding paper, researchers from the National University of Singapore, Huawei Noah's Ark Laboratory and other researchers proposed a CAME optimizer, which has the same performance as Adam while reducing memory consumption. Training a large language model through the CAME optimizer can greatly reduce the The cost of model training.
12.VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models
Li Feifei's team embodies the latest achievements in intelligence. The robot is connected to a large model and has conducted large-scale research in simulated and real robot environments. It can perform more than 30 daily operation tasks specified in free-form natural language.
13. A Survey on Graph Classification and Link Prediction based on GNN
The purpose of this article is to introduce graph classification and link prediction methods based on graph neural networks. First, the basic principles of graph convolutional neural networks are introduced in detail, and then graph neural network models based on attention mechanisms and autoencoders are described, and their Applications and related datasets in tasks such as node classification, graph classification and link prediction.
14.LONGNET: Scaling Transformers to 1,000,000,000 Tokens
The paper introduces a Transformer variant, LONGNET, that can extend sequence lengths to over 1 billion tokens without sacrificing performance for shorter sequences.
15.Segment Anything Meets Point Tracking
The paper proposes the SAM-PT method, which extends the capabilities of the SAM model to track and segment any target in dynamic videos.
16.Generate Anything Anywhere in Any Scene
A text-to-image diffusion model capable of generating arbitrary scenes, arbitrary places, and arbitrary objects is introduced.
17. RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control
This research addresses how to directly apply vision-language models trained on Internet-scale data to end-to-end robot control to improve generalization and enable emerging semantic reasoning.
Click the link to download the "July Must-Read Papers Collection: