A review of the progress of large-scale language model recommendation technology in 2023: classification, progress, problems, trends.

This article is reproduced from: A Survey on Large Language Model Recommendation Technology Progress in 2023: Classification, Progress, Problems, Trends.
Link to the original text: https://arxiv.org/pdf/2305.19860.pdf
A Survey on Large Language Models for Recommendation

abstract

Large language models (LLMs) have become powerful tools in the field of natural language processing (NLP), and have recently gained significant attention in the field of recommender systems (RS). These models, trained on massive data via self-supervised learning, have achieved remarkable success in learning general representations and have the potential to enhance various aspects of recommender systems with some effective transfer techniques such as fine-tuning and timely adjustment. The key to harnessing the power of language models to improve recommendation quality lies in leveraging their high-quality textual feature representations and extensive external knowledge coverage to build correlations between items and users. To gain a comprehensive understanding of existing LLM-based recommender systems, this review proposes a taxonomy that divides these models into two major paradigms, discriminative LLM for recommendation (DLLM4Rec) and generative LLM for recommendation. (GLLM4Rec), and systematically combed the latter for the first time. In addition, we systematically review and analyze existing LLM-based recommender systems in each paradigm, deeply exploring their methods, techniques, and performance. Furthermore, we point out key challenges and some valuable findings to inspire researchers and practitioners.
insert image description here

1. Content introduction

Recommender systems play a vital role in helping users find relevant personalized items or content. With the advent of large language models (LLMs) in the field of natural language processing (NLP), there has been a growing interest in harnessing the power of these models to enhance recommender systems.

A key advantage of incorporating LLMs into recommender systems is their ability to extract high-quality textual feature representations and exploit the wealth of external knowledge encoded in them. This study regards LLM as a Transformer-based model with a large number of parameters, trained on massive datasets using self/semi-supervised learning techniques, such as BERT, GPT series, PaLM series, etc. Different from traditional recommender systems, LLM-based models are good at capturing contextual information and can more effectively understand user queries, item descriptions, and other textual data. By understanding the context, LLM-based RS can improve the accuracy and relevance of recommendations, thereby improving user satisfaction. At the same time, in the face of the common data sparsity problem of limited historical interactions, LLM also brings new possibilities to recommender systems through zero/few recommendation functions. Thanks to extensive pre-training using factual information, domain expertise, and commonsense reasoning, these models can generalize to unseen candidates, allowing them to deliver even without prior exposure to specific items or users. Reasonable recommendation.

The above strategies have been widely used in discriminative models. However, with the development of artificial intelligence learning paradigms, generative language models have begun to emerge. The emergence of ChatGPT and other similar models is the best example, they have greatly subverted the human life and work mode. In addition, the fusion of generative models and recommender systems opens up possibilities for more innovations and practical applications. For example, the interpretability of recommendations can be improved because LLM-based systems are able to provide explanations based on their language generation capabilities, helping users understand the factors that influence recommendations. In addition, generative language models enable more personalized and context-aware recommendations, such as in chat-based recommendation systems, where users can customize prompts, thereby increasing user engagement and satisfaction with the diversity of results.

The above paradigms are effective in solving the data scarcity and efficiency problems. Motivated by this, applying the language modeling paradigm to recommendation has become a promising direction in both academia and industry, which greatly promotes the research of recommender systems. latest developments.

A comprehensive overview of recent advances and systematic introductions to generative large-scale language models for recommendation is lacking so far. To address this issue, we delve into LLM-based recommender systems, splitting them into discriminative LLM for recommendation and generative LLM for recommendation, with our review focusing on the latter. The main contributions of this review are outlined below:

  • We conduct a systematic survey of the current state of LLM-based recommender systems, with a focus on extending the capacity of language models. By analyzing existing methods, we provide a systematic overview of related progress and applications.

  • To our knowledge, our survey is the first comprehensive, up-to-date survey dedicated to generative large-scale language models for recommender systems.

  • Our survey critically analyzes the strengths, weaknesses and limitations of existing approaches. We identify major challenges facing LLM-based recommender systems and present valuable findings that could inspire further research in this potential area.

2. LLM recommended modeling paradigm

The basic framework of all large-scale language models is composed of multiple transformation modules, such as GPT, PaLM, LLaMA, etc. The input to such architectures generally consists of tag embeddings or position embeddings, etc., while the expected output embeddings or tags are available in the output module. The input and output data types here are both text sequences. As shown in (1)-(3) in Figure 1, for the adaptation of the language model in recommendation, that is, the modeling paradigm, the existing work can be roughly divided into the following three categories:
Figure 1. Three modeling paradigms for the study of large-scale language models for recommender systems.

Figure 1. Three modeling paradigms for the study of large-scale language models for recommender systems.
  • LLM Embedding + RS (LLM Embeddings + RS) : This modeling paradigm treats the language model as a feature extractor, feeds the features of items and users into LLM, and outputs the corresponding embeddings. Traditional RS models can leverage knowledge-aware embeddings for various recommendation tasks.

  • LLM Tokens + RS (LLM Tokens + RS) : Similar to the previous method, this method generates tokens based on input item and user characteristics. The generated tokens capture latent preferences through semantic mining and integrate them into the decision-making process of recommender systems.

  • LLM as RS (LLM as RS) : Different from (1) and (2), this model aims to directly transform the pre-trained LLM into a powerful recommender system. Input sequences usually consist of profile descriptions, behavioral cues, and task instructions. The output sequence is expected to provide plausible recommendation results.
    Figure 2. Classification of research on large-scale language models for recommendation systems

Figure 2. Classification of research on large-scale language models for recommendation systems

In practical applications, the choice of language model largely affects the design of modeling paradigms in recommender systems. As shown in Figure 2, this paper divides existing work into two categories, discriminative LLM and generative LLM for recommendation. According to different training methods, the classification method recommended to use LLM can be further subdivided. The difference between different training methods is shown in Figure 3. In general, discriminative language models are well suited to the embedding paradigm (1), while the response generation capabilities of generative language models further support paradigms (2) or (3).
Figure 3. Five different training (domain adaptation) approaches for LLM-based recommendations

Figure 3. Five different training (domain adaptation) approaches for LLM-based recommendations

3. Discriminative LLM for recommendation

In fact, the so-called discriminative language model in the recommendation field mainly refers to the BERT series of models. Due to their specialization in natural language understanding tasks, discriminative language models are often regarded as the embedding backbone for downstream tasks. The same goes for recommender systems. Most existing studies have fine-tuned representations of pre-trained models such as BERT to fit domain-specific data. In addition, some studies have explored training strategies such as timely adjustments. Table 1 and Table 2 list representative methods and commonly used datasets.
Table 1. List of some representative LLM-based recommended methods

Table 1. List of some representative LLM-based recommended methods

3.1 Fine-tuning

Fine-tuning pre-trained language models is a general technique that has received extensive attention in various natural language processing (NLP) tasks including recommender systems. The idea behind fine-tuning is to adapt a language model that has learned rich language representations from large-scale text data to a specific task or domain by further training on task-specific data.
Table 2. List of common datasets used in existing LLM-based recommendation methods

Table 2. List of common datasets used in existing LLM-based recommendation methods

The fine-tuning process consists of initializing a pre-trained language model with its learned parameters and then training it on a specific recommendation dataset. This dataset typically includes user interactions with items, text descriptions of items, user profiles, and other relevant contextual information. During fine-tuning, the parameters of the model are updated based on task-specific data, allowing it to be adapted and specialized for the recommendation task. In the pre-training and fine-tuning stages, the learning objectives can be different.

Since the fine-tuning strategy is very flexible, most BERT-enhanced recommendation methods can be summarized in this track. In summary, integrating BERT fine-tuning into a recommender system, which incorporates powerful external knowledge and personalized user preferences, its main purpose is to improve the accuracy of recommendations while gaining a little cold-start processing power for new items with limited historical data.

3.2 Prompt Tuning

Prompt tuning (Prompt Tuning) is not to adapt LLM to different downstream recommendation tasks by designing specific objective functions, but tries to make the recommended tuning objects consistent with the pre-trained loss through hard/soft hints and tag word verbizers .

Experiments find that by using a combination of multiple hints, the performance of the recommender system is significantly improved, surpassing the results achieved with a single hint on both discrete and continuous templates. This highlights the effectiveness of cue combinations in combining multiple cues to make more informed decisions.

4. Generative LLM for recommendation

Generative models have better natural language generation capabilities than discriminative models. Therefore, unlike most discriminative model-based methods that match LLM-learned representations to the recommendation domain, most work based on generative models converts the recommendation task into a natural language task, and then applies contextual learning, cue adjustment, and instruction Tuning and other techniques to tune the LLM so that it directly generates recommendation results. Moreover, this type of work has recently received increasing attention with the impressive capabilities demonstrated by ChatGPT.

As shown in Fig. 2, these generative LLM-based methods can be further subdivided into two paradigms: non-tuned and tuned paradigms, depending on whether parameters are tuned or not. The next two subsections discuss their details respectively. Representative methods and commonly used datasets are also listed in Table 1 and Table 2.

4.1 Non-tuning Paradigm

LLMs have shown strong zero-shot/few-shot learning capabilities in many unseen tasks. Therefore, some recent studies assume that LLMs already have recommendation capabilities and try to trigger these capabilities by introducing specific cues. They adopt the recent "Instruction and In-Context Learning" (Instruction and In-Context Learning) approach to apply LLMs to recommendation tasks without tuning model parameters.

According to whether the prompt includes demonstration examples, the research of this paradigm is mainly divided into the following two categories: prompting and in-context learning.

Prompting

This type of work aims to design more appropriate instructions and hints to help LLMs better understand and solve recommendation tasks.

Literature [1] systematically evaluates the performance of ChatGPT on five common recommendation tasks, namely rating prediction, sequential recommendation, direct recommendation, explanation generation, and review summarization. They proposed a general framework for building recommendation prompts , which includes (1) task description, which adapts recommendation tasks to natural language processing tasks; (2) behavior injection, which combines user-item interactions to help LLM capture user preferences and needs; (3) A format indicator, which constrains the output format and makes the recommendation results easier to understand and evaluate. Similarly, literature [2] conducted an empirical analysis of the recommendation ability of ChatGPT in three common information retrieval tasks such as point sorting, pair sorting and list sorting. They proposed different prompts for different types of tasks, and introduced role instructions (such as "You are now a news recommendation system") at the beginning of the prompts to enhance the domain adaptability of ChatGPT.

Some works do not propose a general framework, but focus on designing effective hints for specific recommendation tasks . Reference [3] mines movie recommendation hints from the pre-trained corpus of GPT-2. Literature [4] introduced two prompting methods to improve the sequential recommendation ability of LLM: recency-focused sequential prompting - which enables LLMs to perceive sequential information in user interaction history; resampling technique (bootstrapping) - The list of candidate items is shuffled multiple times, and the average score is used for sorting to alleviate the problem of positional deviation.

Due to the limited number of input tokens that LLM allows, it can be difficult to enter a long candidate list at the prompt. The context length limitation of LLMs becomes an issue when sorting long candidate lists . In order to solve this problem, literature [5] proposed a sliding window prompt strategy, that is, only sort the candidate words in the window each time, and then slide the window in order from back to front, and finally repeat this process many times to get Total sorted results.

In addition to using LLMs as recommender systems, some studies also utilize LLMs to construct model features . GENRE [6] introduces three hints, using LLMs for three feature enhancement subtasks of news recommendation. Specifically, it uses ChatGPT to extract news titles based on abstracts, extract feature keywords from user reading history, and generate synthetic news to enrich user history interactions. By combining these features constructed by LLMs, traditional news recommendation models can be significantly improved. Similarly, NIR [7] designed two hinting methods to generate user preference keywords and extracted representative movies from user interaction history to improve movie recommendation.

In practical applications, in addition to the ranking model, the entire recommendation system generally includes multiple import components, such as content databases, candidate retrieval models, and so on. Therefore, another way to use LLMs for recommendation is to use them as the controller of the whole system . ChatREC [8] designed an interactive recommendation framework around ChatGPT, understands user needs through multiple rounds of dialogue, and calls existing recommendation systems to provide results. In addition, ChatGPT can also control the database to retrieve relevant content to supplement prompts and solve the problem of cold start projects. GeneRec [9] proposes a generative recommendation framework and uses LLMs to control when to recommend existing items or generate new items via the AIGC model.

Taken together, these studies leverage natural language cues to activate the zero-spot capability of LLMs in recommendation tasks, providing a low-cost, practical solution.

in-context learning

In-context learning is a technique used by GPT-3 and other LLMs to quickly adapt to new tasks and new information. With a few exemplary input-label pairs, they can predict labels for unseen inputs without additional parameter updates. Therefore, some works try to add demonstrative examples in the hints to make LLM better understand the recommendation task.

For sequential recommendation, [4] introduces demonstration examples by enhancing the input interaction sequence itself. Specifically, they pair the prefixes of the input interaction sequences with the corresponding suffixes as examples. References [1] and [2] designed demonstration example templates for various recommendation tasks, and the experimental results also show that the context learning method will improve the recommendation ability of LLM on most tasks.

However, compared to Prompting, only a few studies have explored In-context Learning of Language Models using Language Models (LLMs) in recommendation tasks. There are still many open issues, including the selection of demonstration examples and the impact of the number of demonstration examples on recommendation performance.

4.1 Tuning Paradigm

As mentioned above, LLMs have a strong zero-shot/few-shot learning ability, and their recommendation performance can greatly exceed random guessing if properly designed hints. However, it is not surprising that recommender systems built in this way are unable to surpass the performance of recommender models specially trained on specific data for specific tasks. Therefore, many researchers hope to improve the recommendation ability of LLM by further fine-tuning or hint learning. According to the method of literature [10], this paper divides the paradigm of tuning methods into two different types, namely prompt tuning and instruction tuning. Specifically, under the cue-tuning paradigm, the parameters of LLMs or soft cues are tuned for a specific task (e.g., score prediction), while under the instruction-tuning paradigm, LLMs are fine-tuned for multiple tasks with different instruction types . Therefore, LLMs can achieve better zeroing ability through instruction tuning. However, we believe that there is currently no clear division or accepted definition of these two fine-tuning paradigms.

Prompt Tuning

In this paradigm, an LLM typically takes user/item information as input and outputs user preferences (such as likes or dislikes) or ratings for items. For example, [11] proposes to format user historical interactions as hints, where each interaction is represented by item information, and formulates the rating prediction task as two distinct tasks, multi-class classification and regression. The authors further investigate LLMs of different sizes, with parameters ranging from 250M to 540B, and evaluate their performance in zero-shot. The fine-tuned FLAN-T5-XXL (11B) model was found to achieve the best results. TALLRec proposed in [12] is trained through two adjustment stages. Specifically, TALLRec is first fine-tuned on self-guided data from Alpaca [13]. TALLRec is then further fine-tuned by recommendation tuning, where the input is the user's history sequence and the output is the "yes or no" feedback.

Reference [14] proposes to formulate the recommendation task as a next-token prediction problem, and evaluates the proposed method in zero-shot and fine-tuning settings on a movie recommendation dataset. and fine-tuning settings were evaluated. The article also observes that the language model shows obvious language bias when recommending. Furthermore, fine-tuned LLMs can learn how to recommend, but language bias remains for the underlying predictions.

In addition to directly fine-tuning LLM, some studies also propose to use prompt learning (Prompt Learning) to achieve better performance. For example, literature [15] designed a unified conversational recommender system named UniCRS based on knowledge-augmented hint learning. In this paper, the authors propose to freeze the parameters of LLMs and train soft hints via hint learning for response generation and item recommendation. The literature [16] proposes the generative ability based on LLMs to provide user-understandable explanations. The authors experimented with discrete cue learning and continuous cue learning, and further proposed two training strategies, sequential adjustment and recommendation as regularization.

Instruction Tuning

In this paradigm, LLMs are fine-tuned for multiple tasks with different instruction types. In this way, LLMs can better align with human intent and achieve better zeroing ability. For example, [17] proposes to fine-tune the T5 model for five different types of instructions, namely sequential recommendation, rating prediction, explanation generation, review summarization, and direct recommendation. After multi-task instruction adjustment on the recommendation dataset, the model can achieve zero-point generalization ability to unseen personalized prompts and new items. Similarly, the literature [18] proposes to fine-tune the M6 ​​model on three types of tasks, namely scoring tasks, generation tasks and retrieval tasks. Literature [19] first designed a general instruction format from three key aspects of preference, intention and task form. Then, the author manually designed 39 instruction templates, and automatically generated a large amount of user personalized instruction data for instruction tuning on the 3B FLAN-T5-XL model. Experimental results show that this method outperforms several competing baselines including GPT-3.5.

5. Summary of findings

In this paper, we systematically review the application paradigms and adaptation strategies of large-scale language models in recommender systems, especially generative language models. We discover their potential to improve the performance of traditional recommendation models in specific tasks. **However, it is important to point out that overall exploration in this area is still in its early stages. **Researchers may find it challenging to identify the most research-worthy problems and pain points. To address this issue, we summarize the common conclusions drawn from many studies on large-scale model recommendation. These findings highlight certain technical challenges and offer potential opportunities for further development in this field.

5.1. Model Bias

  • Position Bias . In the generative language modeling paradigm of recommender systems, various information such as user behavior sequences and recommendation candidates are input into the language model in the form of textual order descriptions, which may introduce the language model itself Some positional bias is inherent. For example, the order of candidates affects the ranking results of LLM-based recommendation models, i.e., LLM usually prioritizes the top-ranked items. Reference [20] uses random sampling based bootstrapping to mitigate candidate position bias and emphasizes recently interacted items to enhance behavioral order. However, these solutions are not adaptable enough, and more robust learning strategies are needed in the future.

  • Popularity Bias. LLM ranking results are affected by the popularity of candidates. Popular items that are often widely discussed and mentioned in LLM's pre-trained corpus tend to be ranked higher. Solving this problem is challenging because it is closely related to the composition of the pre-training corpus.

  • Fairness Bias . Pretrained language models can exhibit fairness issues related to sensitive attributes that are influenced by the training data or individual demographic data involved in certain task annotations. These fairness issues may cause the model to assume that users belong to a certain group when making recommendations, which may lead to controversial issues when it is deployed commercially. Bias in recommendation results due to gender or race is an example. Addressing these fairness issues is critical to ensuring fair and unbiased recommendations.

5.2. Recommendation Prompt Design

  • User/Item Representation . In practice, recommender systems usually represent users and items using a large number of discrete and continuous features. However, most existing LLM-based works only use names to denote items and a list of item names to denote users, which is insufficient for accurate modeling of users and items. In addition, it is also crucial to convert users' heterogeneous behavior sequences (such as click, add to cart, and purchase in the e-commerce domain) into natural language for preference modeling. In traditional recommendation models, ID-like features have been proven effective, but it is also challenging to incorporate them into hints to improve personalized recommendation performance.

  • Limited context length . The context length limitation of LLM will restrict the length of the user behavior sequence and the number of candidates, resulting in unsatisfactory performance. Existing works propose techniques to alleviate this problem, such as selecting representative items from user behavior sequences [7] and sliding window strategies for candidate lists [21].

5.3 Promising Capabilities

  • Zero/Few Shot Recommendation Capabilities . Experimental results on multiple domain datasets show that LLMs have impressive zero/few-shot recommendation capabilities in various recommendation tasks [4, 1]. It is worth noting that "few-shot learning", equivalent to "in-context learning", does not change the parameters of LLMs. This suggests that LLMs have the potential to alleviate the cold-start problem with limited data. However, there are still some issues to be resolved, such as the need for more explicit guidance to select representative and effective demonstration examples for few-shot learning, and experimental results from more domains are needed to further support the research on zero/few-shot recommendation capabilities. in conclusion.

  • Interpretability . Generative LLMs have shown remarkable capabilities in natural language generation. Therefore, it is a natural idea to leverage LLMs for explainable recommendation via text generation. Literature [1] conducted comparative experiments between ChatGPT and some baselines on explanation generation tasks. The results show that even without fine-tuning, ChatGPT still outperforms some supervised traditional methods in the contextual learning setting. Furthermore, ChatGPT’s explanations are even clearer and more plausible than ground truth based on human evaluation. Encouraged by these exciting preliminary experimental results, fine-tuning the performance of LLMs for explainable recommendation will be promising.

5.4 Assessment Questions

  • Output Control . As we mentioned before, many studies use large-scale models as recommender systems by providing well-crafted instructions. For these LLMs, the output should strictly adhere to the given instruction format, such as providing a binary answer (yes or no) or generating a sorted list. However, in practical applications, the output of LLM may deviate from the desired output format. For example, a model may generate incorrectly formatted responses or even refuse to provide an answer [2]. Therefore, how to ensure better control over the output of LLM is an urgent problem to be solved.

  • Evaluation Criteria . If the task performed by LLM is a standard recommendation task, such as rating prediction or item ranking, we can adopt existing evaluation metrics for evaluation, such as NDCG, MSE, etc. However, LLMs also have strong generative capabilities, making them suitable for generative recommendation tasks [9]. Following the generative recommendation paradigm, LLM can generate items that have never appeared in historical data and recommend them to users. In this context, evaluating the ability of LLMs to generate recommendations remains an open problem.

  • Datasets . Currently, most of the research in this field mainly uses datasets such as MovieLens, Amazon Books, and similar benchmarks to test the recommendation ability and zero-shot/few-shot learning ability of LLM. However, this may cause the following two potential problems. First, the scale of these datasets is relatively small compared with real-world industrial datasets, which may not fully reflect the recommendation ability of LLM. Second, items in these datasets (such as movies and books) may be related to information present in the LLM's pre-trained data. This may introduce bias when evaluating the zero-point learning ability of LLMs. Currently, we lack a suitable benchmark for a more comprehensive evaluation.

In addition to the outstanding findings above, there are some limitations to the capabilities of large language models. For example, the challenge of knowledge forgetting may arise when training models for domain-specific tasks or updating model knowledge [22]. Another problem is that different sizes of language model parameters will produce different performance, and using an overly large model will lead to high computational costs for research and deployment of recommender systems [4]. These challenges also bring valuable research opportunities to the field.

6 Conclusion

This paper reviews the research area of ​​large language models (LLMs) for recommender systems. We divide existing work into discriminative and generative models, and then elaborate on it in a domain-adaptive manner. To avoid conceptual confusion, we define and distinguish fine-tuning, hinting, hint-tuning, and instruction-tuning in LLM-based recommendation. To the best of our knowledge, this is the first systematic up-to-date review dedicated to generative LLMs for recommender systems, which further summarizes the common findings and challenges of numerous related studies. Therefore, this survey provides a valuable resource for researchers to gain a comprehensive understanding of LLM recommendations and explore potential research directions.

references

  1. Junling Liu, Chao Liu, Renjie Lv, Kang Zhou, and Yan Zhang. Is chatgpt a good recommender? A preliminary study. CoRR, abs/2304.10149, 2023.

  2. Sunhao Dai, Ninglu Shao, Haiyuan Zhao, Weijie Yu, Zihua Si, Chen Xu, Zhongxiang Sun, Xiao Zhang, and Jun Xu. Uncovering chatgpt's capabilities in recommender systems. CoRR, abs/2305.02182, 2023.

  3. Damien Sileo, Wout Vossen, and Robbe Raymaekers. Zero-shot recommendation as language modeling. In ECIR (2), volume 13186 of Lecture Notes in Computer Science, pages 223–230. Springer, 2022.

  4. Yupeng Hou, Junjie Zhang, Zihan Lin, Hongyu Lu, Ruobing Xie, Julian J. McAuley, and Wayne Xin Zhao. Large language models are zero-shot rankers for recommender systems. CoRR, abs/2305.08845, 2023.

  5. Weiwei Sun, Lingyong Yan, Xinyu Ma, Pengjie Ren, Dawei Yin, and Zhaochun Ren. Is chatgpt good at search? investigating large language models as reranking agent. CoRR, abs/2304.09542, 2023.

  6. Qijiong Liu, Nuo Chen, Tetsuya Sakai, and Xiao-Ming Wu. A first look at llm-powered generative news recommendation. CoRR, abs/2305.06566, 2023.

  7. Lei Wang and Ee-Peng Lim. Zeroshot next-item recommendation using large pretrained language models. CoRR, abs/2304.03153, 2023.

  8. Yunfan Gao, Tao Sheng, Youlin Xiang, Yun Xiong, Haofen Wang, and Jiawei Zhang. Chat-rec: Towards interactive and explainable llms-augmented recommender system. CoRR, abs/2303.14524, 2023.

  9. Wenjie Wang, Xinyu Lin, Fuli Feng, Xiangnan He, and Tat-Seng Chua. Generative recommendation: Towards next-generation recommender paradigm. CoRR, abs/2304.03516, 2023.

  10. Jason Wei, Maarten Bosma, Vincent Y. Zhao, Kelvin Guu, Adams Wei Yu, Brian Lester, Nan Du, Andrew M. Dai, and Quoc V. Le. Finetuned language models are zero-shot learners. In ICLR. OpenReview.net, 2022.

  11. Wang-Cheng Kang, Jianmo Ni, Nikhil Mehta, Maheswaran Sathiamoorthy, Lichan Hong, Ed H. Chi, and Derek Zhiyuan Cheng. Do llms understand user preferences? evaluating llms on user rating prediction. CoRR, abs/2305.06474, 2023.

  12. Keqin Bao, Jizhi Zhang, Yang Zhang, Wenjie Wang, Fuli Feng, and Xiangnan He. Tallrec: An effective and efficient tuning framework to align large language model with recommendation. CoRR, abs/2305.00447, 2023.

  13. Rohan Taori, Ishaan Gulrajani, Tianyi Zhang, Yann Dubois, Xuechen Li, Carlos Guestrin, Percy Liang, and Tatsunori B. Hashimoto. Stanford alpaca: An instruction-following llama model. https://github.com/tatsu-lab/stanford alpaca, 2023.

  14. Yuhui Zhang, Hao Ding, Zeren Shui, Yifei Ma, James Zou, Anoop Deoras, and Hao Wang. Language models as recommender systems: Evaluations and limitations. 2021.

  15. Xiaolei Wang, Kun Zhou, Ji-Rong Wen, and Wayne Xin Zhao. Towards unified conversational recommender systems via knowledge-enhanced prompt learning. In KDD, pages 1929–1937. ACM, 2022.

  16. Lei Li, Yongfeng Zhang, and Li Chen. Personalized prompt learning for explainable recommendation. ACM Transactions on Information Systems, 41(4):1– 26, 2023.

  17. Shijie Gen, Shuchang Liu, Zuohui Fu, Yingqiang Ge, and Yongfeng Zhang. Recommendation as language processing (RLP): A unified pretrain, personalized prompt & predict paradigm (P5). In RecSys, pages 299–315. ACM, 2022.

  18. Zeyu Cui, Jianxin Ma, Chang Zhou, Jingren Zhou, and Hongxia Yang. M6-rec: Generative pretrained language models are open-ended recommender systems. CoRR, abs/2205.08084, 2022.

  19. Junjie Zhang, Ruobing Xie, Yupeng Hou, Wayne Xin Zhao, Leyu Lin, and Ji-Rong Wen. Recommendation as instruction following: A large language model empowered recommendation approach. CoRR, abs/2305.07001, 2023.

  20. Yupeng Hou, Shanlei Mu, Wayne Xin Zhao, Yaliang Li, Bolin Ding, and Ji-Rong Wen. Towards universal sequence representation learning for recommender systems. In KDD, pages 585–593. ACM, 2022.

  21. Weiwei Sun, Lingyong Yan, Xinyu Ma, Pengjie Ren, Dawei Yin, and Zhaochun Ren. Is chatgpt good at search? investigating large language models as reranking agent. CoRR, abs/2304.09542, 2023.

  22. Joel Jang, Seonghyeon Ye, Sohee Yang, Joongbo Shin, Janghoon Han, Gyeonghun Kim, Stanley Jungkyu Choi, and Minjoon Seo. Towards continual knowledge learning of language models. In ICLR, 2022.

Guess you like

Origin blog.csdn.net/SmartLab307/article/details/131961622
Recommended