7 Papers | Zhejiang University's research won the best paper in SIGMOD 2023; GPT-4 won the most difficult mathematical reasoning dataset new SOTA

Source | Heart of the Machine ID | almosthuman2014

This week's papers include 10% cost custom exclusive class GPT-4 multi-modal large model; GPT-4 won the most difficult mathematical reasoning dataset new SOTA and other research.

Table of contents:

  1. Transfer Visual Prompt Generator across LLMs 

  2. Progressive-Hint Prompting Improves Reasoning in Large Language Models

  3. AutoML-GPT: Automatic Machine Learning with GPT

  4. MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers

  5. Unlimiformer: Long-Range Transformers with Unlimited Length Input 

  6. Detecting Logic Bugs of Join Optimizations in DBMS

  7. REASONER: An Explainable Recommendation Dataset with Multi-aspect Real User Labeled Ground Truths

论文 1:Transfer Visual Prompt Generator across LLMs

  • Author: Ao Zhang, Hao Fei, etc.

  • Paper address: https://arxiv.org/pdf/2305.01278.pdf

Abstract: The VPGTrans method proposed in this paper can quickly (less than 10% training time) transfer the visual module of the existing multimodal dialogue model to the new language model, and achieve similar or better results. For example, compared to training the vision module from scratch, this paper can reduce the training cost of BLIP-2 FlanT5-XXL from 19000 + RMB to less than 1000 RMB:

Through the VPGTrans framework, visual modules can be flexibly added to various new large language models according to requirements. For example, VL-LLaMA and VL-Vicuna are produced on the basis of LLaMA-7B and Vicuna-7B.

Open source multimodal dialogue model: This article open source VL-Vicuna, which can achieve high-quality multimodal dialogue:

Recommendation: 10% cost to customize the exclusive GPT-4 multi-modal large model.

论文 2:Progressive-Hint Prompting Improves Reasoning in Large Language Models

  • Authors: Chuanyang Zheng, Zhengying Liu, etc.

  • Paper address: https://arxiv.org/abs/2304.09797

Abstract: Recently, Huawei and Hong Kong Chinese published a paper "Progressive-Hint Prompting Improves Reasoning in Large Language Models", proposing Progressive-Hint Prompting (PHP), which is used to simulate the process of human beings doing questions. Under the framework of PHP, Large Language Model (LLM) can use the reasoning answers generated several times before as hints for subsequent reasoning, gradually approaching the final correct answer. To use PHP, only two requirements need to be met: 1) The question can be combined with the inference answer to form a new question; 2) The model can handle this new question and give a new inference answer.

The results show that GP-T-4+PHP achieves SOTA results on multiple datasets, including SVAMP (91.9%), AQuA (79.9%), GSM8K (95.5%) and MATH (53.9%). This method outperforms GPT-4+CoT by a large margin. For example, on the most difficult mathematical reasoning dataset MATH, GPT-4+CoT is only 42.5%, while GPT-4+PHP improves the Nember Theory (Number Theory) subset of the MATH dataset by 6.1%, raising the overall MATH to 53.9%, reaching SOTA.

Recommendation: GPT-4 won the new SOTA of the most difficult mathematical reasoning dataset.

论文 3:AutoML-GPT: Automatic Machine Learning with GPT

  • Authors: Shujian Zhang, Chengyue Gong, etc.

  • Paper address: https://papers.labml.ai/paper/35151be0eb2011edb95839eec3084ddd

Abstract: Recently, researchers from the University of Texas at Austin proposed a new idea - developing a task-oriented prompt, using LLM to automate the training pipeline, and based on this idea, a new system AutoML-GPT was launched.

AutoML-GPT uses GPT as a bridge between various AI models and dynamically trains the models with optimized hyperparameters. AutoML-GPT dynamically receives user requests from Model Card [Mitchell et al., 2019] and Data Card [Gebru et al., 2021] and composes corresponding prompt paragraphs. Finally, AutoML-GPT uses this prompt section to automate several experiments, including processing data, building model architecture, tuning hyperparameters, and predicting training logs.

AutoML-GPT solves complex AI tasks in various tests and datasets by maximizing its powerful NLP capabilities and existing AI models. A large number of experiments and ablation studies show that AutoML-GPT is general and effective for many artificial intelligence tasks (including CV tasks, NLP tasks).

Recommendation: The general system AutoML-GPT is here.

论文 4:MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers

  • Author: Lili Yu, Daniel Simig, etc.

  • Paper address: https://arxiv.org/pdf/2305.07185.pdf

Abstract: A new paper published by Meta AI proposes a multi-scale decoder architecture called "MEGABYTE" that can model end-to-end differentiable sequences over a million bytes.

Importantly, the paper showed the feasibility of abandoning tokenization, which was rated as "Promising" by Karpathy.

This method divides the byte sequence into fixed-size patches, similar to tokens.

The MEGABYTE model consists of three parts:

1. A patch embedder that simply encodes a patch by losslessly concatenating the embeddings of each byte;

2. Global module - a large autoregressive transformer with input and output patch representations;

3. Local module - a small autoregressive model that predicts the bytes in the patch.

Crucially, the study found that for many tasks, most bytes are relatively predictable (eg, completing a word given the first few characters), which means there is no need to use a large Instead of neural networks, much smaller models can be used for intra-patch modeling.

Recommendation: Do you have to "participate"? Andrej Karpathy: It's time to throw away this historical baggage.

论文 5:Unlimiformer: Long-Range Transformers with Unlimited Length Input

  • 作宇:Amanda Bertsch、Uri Alon et al

  • Paper address: https://arxiv.org/pdf/2305.01625v1.pdf

Abstract: Researchers from Carnegie Mellon University introduced Unlimiformer. This is a retrieval-based approach that augments a pre-trained language model to accept an input of unlimited length at test time.

Unlimiformer can be injected into any existing encoder-decoder transformer, capable of processing input of unlimited length. Given a long input sequence, Unlimiformer can build a data store on the hidden states of all input tokens. The decoder's standard cross-attention mechanism is then able to query the data store and focus on the top k input tokens. Data storage can be stored in GPU or CPU memory and can be queried sub-linearly.

Unlimiformer can be directly applied to the trained model and can improve the existing checkpoint without any further training. After Unlimiformer has been fine-tuned, the performance will be further improved. This paper demonstrates that Unlimiformer can be applied to multiple base models, such as BART (Lewis et al., 2020a) or PRIMERA (Xiao et al., 2022), without adding weights and retraining. In various long-range seq2seq datasets, Unlimiformer is not only stronger than Longformer (Beltagy et al., 2020b), SLED (Ivgi et al., 2022) and Memorizing transformers (Wu et al., 2021) and other long-range Transformers on these datasets performance is better, and this paper also finds that Unlimiform can be applied on top of the Longformer encoder model for further improvement.

Recommendation: Unlimiformer stretches the context length to infinity.

论文 6:Detecting Logic Bugs of Join Optimizations in DBMS

  • Author Institution: Zhejiang University

Abstract: Researchers from Zhejiang University proposed a method called Transformed Query Synthesis (TQS). TQS is a new generalizable and cost-effective tool for the task of detecting join-optimized logic vulnerabilities in DBMSs.

To demonstrate the generality and effectiveness of the approach, the researchers evaluated TQS on four commonly used DBMSs: MySQL, MariaDB, TiDB, and PolarDB. After running for 24 hours, TQS successfully found 115 vulnerabilities, including 31 in MySQL, 30 in MariaDB, 31 in TiDB, and 23 in PolarDB. By analyzing the root cause, the types of these vulnerabilities can be summarized, among which there are 7 types of vulnerabilities in MySQL, 5 types in MariaDB, 5 types in TiDB, and 3 types in PolarDB. The researchers have submitted the discovered vulnerabilities to the corresponding communities and received positive feedback.

Figure 2 gives an overview of the architecture of TQS. Given a benchmark dataset and a target DBMS, TQS searches for possible logic holes in the DBMS by generating queries based on the dataset. TQS has two key components: data-guided schema and query generation (DSG) and knowledge-guided query space exploration (KQE):

Recommendation: Zhejiang University research won the best paper of SIGMOD 2023.

论文 7:REASONER: An Explainable Recommendation Dataset with Multi-aspect Real User Labeled Ground Truths

  • Author: Xu Chen, Jingsen Zhang, etc.

  • Paper address: https://arxiv.org/pdf/2303.00168.pdf

Abstract: Researchers from Renmin University of China and Huawei jointly constructed a new explainable recommendation dataset - REASONER (Real Users Labeled Multi-aspect Explanations for Explainable Recommendation).

The dataset is constructed in the video recommendation scenario and contains ground truth for various recommendation interpretation purposes, such as enhancing recommendation persuasiveness, explaining information, and user satisfaction. It can be widely used in the fields of explainable recommendation, recommendation system correction, and psychology-based recommendation. At the same time, the research also developed an explainable recommendation toolkit, which contains ten well-known explainable recommendation models for everyone to use.

The REASONER dataset has the following highlights:

  • Multimodal Candidate Explanations: Users can choose textual or visual explanations for each recommended video according to their preferences.

  • Multi-faceted explanation truth value: Provide recommendation explanation truth value from three aspects: recommendation persuasiveness, explanation information volume and user satisfaction.

  • Real User Annotation: The annotators who interpret the ground truth in the dataset are the ones who produced the interaction records.

  • Rich user characteristics: This study collected various characteristic information (desensitized) of participating users.

Recommendation: Multi-angle, real user annotation, Renmin University of China & Huawei launched the explainable recommendation dataset REASONER.

Guess you like

Origin blog.csdn.net/lqfarmer/article/details/130893559