Overview of Recommendation System - Reading Notes of "Recommendation System: Principles and Practice"

Starting from the development process of the hypothetical question bank, understand the common models, basic methods and common problems of recommendation systems.

full text structure

  1. foreword
    1. learning problems
  2. text
    1. a systematic development process
    2. Recommendation Model Overview
    3. common problem
  3. reference

foreword

Combining the experience of previous versions; the small program for brushing questions in the question bank (requirement completion, pending) and the training experience of "Microsoft Vanke Practice Camp", I think that the company's business will definitely use the recommendation system in the future, especially the "question bank". Therefore, choose this book to learn related knowledge. After a period of study, I can’t wait to share it here. I hope it can not only provide reference for everyone to learn about the recommendation system, but also provide ideas for how to improve version quality and development efficiency through mathematics in the development process.

Part of this book was read (overview; neighbor collaborative filtering; piecemeal reading to browse subsequent chapters). My personal feeling is that for students who are not good at mathematics, they can pick and choose to understand related concepts such as classification of recommended algorithms, usage and optimization methods, and achievable effects. For students who have a certain mathematical foundation in university, you can read it first. There is no code in it, only text descriptions and a small amount of mathematical formula derivation. As long as the university has a certain mathematical foundation, follow the ideas in the book and derive related formulas from simple to complex. , you can understand. But it is still unavoidable that there will be some mathematics knowledge that I did not learn well or have not been exposed to at the undergraduate level. After communicating with my university teachers, I suggest that interested students read a few other books together——Advanced Mathematics, Probability Theory , linear algebra, matrix analysis, probability analysis.

This book has no code-related things at all, so non-R&D students with a certain mathematical foundation can also read it. And I personally feel that the idea of ​​introducing various recommendation algorithms (recommendation models) is highly similar to the process of mathematical modeling when I was in college, as shown in the figure below. Simplify and abstract problems—according to actual needs, abstract the mathematical expression of object characteristics; model building—build the most basic model from the core problem; iterative model—continuously raise new problems (hypotheses) and solve them one by one (adjustment model) , test; get the final model, and apply it to practice.

learning problems

Before officially sharing, I would like to raise a few problems that I encountered and solved during the learning process.

This book is too complicated, and I think that as an undergraduate student with not outstanding mathematical ability, I will not be able to understand it.

Before reading the book, I did have this doubt. After reading Chapter 3 and skimming the subsequent chapters, I realized that this doubt was unfounded. The great thing about this book is that although it explains the knowledge of recommender systems in depth and detail, it is step-by-step. Even if it is a person who has failed college mathematics, I believe he can at least understand the first few sections of the first chapter and each subsequent chapter. And people who are better at mathematics will feel more cordial when watching it, because it will not talk about a lot of complicated formulas, almost starting from the simplest mathematical model, gradually asking new questions, and giving more complex formulas. Further models, gradually iteratively improved - this process is very similar to mathematical modeling.

There is a mathematical principle in it, and the term cannot be understood, what should I do?

In the process of learning, I also encountered many problems. Fortunately, there is a university teacher who still keeps in touch. He originally came from a communications background and has a deep understanding of mathematics. His suggestion is that you must study probability theory and matrix analysis . Many of the problems encountered in this book are actually the content of these two courses. Because there are priorities, I have not personally arranged for the study of these two courses, so I put it here for the time being, so as to provide reference for students who are interested in further study.

text

a systematic development process

The following processes are all assumptions and have not yet been implemented. The purpose is only to help you understand several basic models of the recommendation system.

We have a question bank with a large number of questions. Now we hope to make a quiz product on which students can do quizzes. At this time encountered the first problem.

Knowledge-based recommendation model

Due to various historical reasons, the historical data of the current system is insufficient, but it is hoped that the recommendation system can be enabled to make effective recommendations for students. This type of problem is called the cold start problem. At this point, the student can clearly describe the questions he wants to obtain to the system (question type, subject, source of test questions, keyword tags, etc.) Library), retrieve the topics that students need, and recommend them to the students.

At this time, this system is called a knowledge-based recommendation system. A major feature of the knowledge-based recommendation system is: to make recommendations based on the clear needs given by users, that is: recommendation results = student requirements + topic features + knowledge base (predefined rules; similarity function).

Further abstraction, that is: recommendation results = user requirements + item attributes + domain knowledge .

Item-based Near Neighbor Collaborative Filtering Model

Now, some people have been using the small program for brushing questions, and they have been using it for quite a while. There may not be many students who use it, but each student is very sticky and uses it very frequently, resulting in a lot of behavior records such as answer records, search records, and question comments. At this time, we realize that students may have different needs for questions in different periods. Therefore, at this time, we can get In the recent period of time, he may have paid attention to topics with certain characteristics-keywords, subjects, sources of test questions (mock test? senior high school entrance examination? joint test?), question types, etc. - this is a positive . He may also generate some operations of skipping questions. At this time, we can know that he does not need questions with certain characteristics-this is a negative evaluation (scoring) .

After having users’ positive and negative comments on certain topics, we can analyze that users need topics with certain characteristics (attributes) more in the recent period, and then make the system push topics closer to their preferences to help users Obtain the topics you expect more efficiently.

At this point, recommendation result = user rating + item attribute .

User-based Near Neighbor Collaborative Filtering Model

Now, the basic functions of the mini program for brushing questions tend to be perfected, and after some operational activities, a large number of users have been accumulated. At this time, not only have many users’ ratings on the topic, but each user can find users similar to them in the system—for example, they have highly similar or the same topic preference labels. Through the clustering algorithm, we will be highly similar to each other users gathered together, at this time, it is easy to find other users who are similar to the specified user.

Suppose there is a student A who has done the topic t, and another student B who has not done the topic t. Because student B is similar to student A, we can think that student B may also be interested in topic t, and thus push topic t to student B. This model is called a user-based collaborative filtering recommendation model. At this time, recommendation results = user ratings + community ratings .

  1. Just in case, here's another reminder that classification and clustering are different. "Classification" divides the data into known types according to specific rules; "clustering" clusters the data that are similar to each other in the data set, and at this time the data in each cluster are similar to each other (focus on similarity) .
  2. Here I did not propose a content-based recommendation system. In fact, the item-based collaborative filtering model is similar to the content-based model. Personal understanding is that the former focuses on the similarity of attribute tags of different items (such as question type, difficulty, etc.), while the latter focuses on the similarity of different items in content (such as question stems and solutions). That is, the basis of the latter is to cluster similar items based on feature extraction and clustering algorithms.
  3. In fact, a recommendation system is often composed of multiple models, which is called a hybrid integrated recommendation system .

Recommendation Model Overview

Continuous learning and improvement.
picture

common problem

The specific solution to each problem is complicated, and it is specially introduced in the book. It is only mentioned here to provide reference for everyone, so that we can pay attention to the existence of these problems as much as possible in future study and practice.

Cold start

At the beginning of the system's operation, there may be insufficient data, so it cannot give effective recommendation results-this is called the cold start problem. Therefore, at the beginning of the system operation, choosing a model that is less sensitive to the cold start problem can provide effective recommendations quickly (such as a knowledge-based recommendation model).

surprise

If through a certain model and specific strategy, it is determined that student A likes to do mechanics problems, and the follow-up recommendation results are all mechanics problems, then the system has a high precision (Precision), but sometimes too high a P value may not be a good thing . If there are wrong recommendations in the M results, including N optical questions, at this time, the user does not skip these questions, but continues to answer, and even generates a collection operation (positive evaluation). Then think that this system has a certain degree of surprise (N/M).

Anti-attack

One attack behavior that recommender systems may encounter is to insert a large number of useless reviews into the system. For example, in Taobao, a merchant injects a large number of negative reviews into another merchant’s products, or injects a large number of positive reviews into its own products. It is conceivable that in the above neighbor model, this attack will directly affect the validity of the recommendation results and the quality of experience of legitimate users. Of course, there are also strategies for resisting attacks or improving the robustness of the model. These are more complicated and are also described in the book.

Private issues

Section introduction In the neighbor-based collaborative filtering model, it can be seen that the user's behavior is recorded and regarded as the user's positive or negative evaluation of the item. In fact, not only in the collaborative filtering model of neighbors, but also in other models. In a recommendation system, user behavior must be recorded as much as possible, and the analysis results may include information such as the user's personal preferences and opinions , which may be highly sensitive .

For the privacy and security issues that may be involved in the model, it can not only be solved from the perspective of technical architecture, policies and regulations, but also some algorithms specially used to protect privacy. For example, when data is collected, the risk of data leakage can be reduced through distributed protocols and perturbation techniques ; when data is released, methods such as k-anonymity, condensation, and t-proximity models can be used to interfere with data records, so that attackers cannot combine data records with other publicly available data.

reference

  1. Charu C.Aggarwal Translated by Li Lingli et al. Recommender System: Principles and Practice[M]. Beijing: Machinery Industry Press, year of publication: start and end page numbers.
  2. Zhihu - "Recommendation System from Getting Started to Getting Started"

Guess you like

Origin blog.csdn.net/qq_23937195/article/details/103804716