latent semantic model

The latent semantic model LFM and LSI, LDA, and Topic Model actually belong to the implicit semantic analysis technology, which is a kind of concept. They are essentially the same, and they all find potential topics or classifications. These techniques were first proposed in the field of text mining, and they have been continuously applied to other fields in recent years, with good application results. For example, in a recommender system, it can automatically cluster items based on user behavior, that is, divide items into different categories/topics, and these topics/categories can be understood as user interests.

For a user, they may have different interests. Taking the example of the Douban book list cited by the author, user A will pay attention to books on mathematics, history, and computers, user B likes books on machine learning , programming languages, and discrete mathematics, and user C likes masters Knuth, Jiawei Han, etc. people's writings. Then when we recommend, we must recommend to the user the books in the category he is interested in. Then the premise is that we have to classify all items (books). How to divide it? Did you notice that the classification standard varies from person to person, and each user has a different idea. Take user B as an example, the three categories he likes can actually be counted as computer books, that is to say, the classification granularity of B is smaller than that of A; for discrete mathematics, he can be counted as either mathematics or As a computer category, that is to say, some items cannot be simply classified into a certain single category; for C users, he tends to be the author of the book, and only reads the books of a few specific authors, then follow A and B are completely different from their classification perspectives.

Obviously we cannot standardize user preferences across the platform by a taxonomy established by the subjective thoughts of a single person (editor) or team.

In addition, we also need to pay attention to two issues:

  1. We have summed up 3 categories in the visible user book list, which does not mean that the user only likes these 3 categories, and has no interest in other categories of books. That is, we need to understand the user's interest in all categories.
  2. For a given class, we need to determine the weight of each book in that class that belongs to that class. Weights help us decide which books to recommend to users.

Let's take a look at how LFM solves the above problems? For a given user behavior dataset (the dataset contains all users, all items, and a list of items that each user has acted on), after modeling it with LFM, we can get the following figure The model of: (assuming there are 3 users, 4 items in the dataset, and the number of classifications modeled by LFM is 4)

 

The R matrix is ​​a user-item matrix, and the matrix value Rij represents the degree of interest of user i to item j, which is exactly the value we require. For a user, after calculating his interest in all items, it can be sorted and recommended. The LFM algorithm extracts several topics from the data set as a bridge between users and items, and expresses the R matrix as the multiplication of the P matrix and the Q matrix. The P matrix is ​​the user-class matrix, and the matrix value Pij represents the degree of interest of user i to class j; the Q matrix is ​​a class-item matrix, and the matrix value Qij represents the weight of item j in class i, the higher the weight. the more representative of the class. So LFM calculates user U's interest in item I according to the following formula

We found that after using LFM, 

  1. We don't need to care about the perspective of classification, the results are automatically clustered based on user behavior statistics, and the data is the final decision.
  2. There is no need to care about the classification granularity. The granularity can be controlled by setting the final classification number of LFM. The larger the classification number, the finer the granularity.
  3. For an item, it is not clearly divided into a certain category, but the probability of it belonging to each category is calculated, which is a standard soft classification.
  4. For a user, we can get his interest in each category, rather than only care about the categories in the visible list.
  5. For each class, we can get the weight of each item in the class, the more representative the item of this class, the higher the weight.

 

Then, the next question is how to calculate the parameter values ​​in matrix P and matrix Q. The general practice is to optimize the loss function to find the parameters. Before defining the loss function, we need to prepare the dataset and explain the value of interest.


The dataset should contain all users and the items they have acted on (ie liked). All these items constitute a complete set of items. For each user, we call the items that he has acted on as positive samples, and stipulate that the degree of interest RUI=1. In addition, we also need to randomly sample from the full set of items, and select a sample that is equivalent to the number of positive samples as negative samples. The degree of interest is specified as RUI=0. Therefore, the value range of interest is [0,1].


After sampling, the original data set is expanded to obtain a new user-item set K={(U,I)}, where if (U,I) is a positive sample, then RUI=1, otherwise RUI=0. The loss function looks like this:

The above formula is the regularization term used to prevent overfitting, and λ needs to be obtained by repeated experiments according to the specific application scenario. The optimization of the loss function uses the stochastic gradient descent algorithm:

  1. Determine the fastest descent direction by finding the partial derivatives of the parameters PUK and QKI;

  1. The iterative calculation continuously optimizes the parameters (the number of iterations is artificially set in advance) until the parameters converge.



where α is the learning rate, and the larger the α, the faster the iteration declines. α, like λ, also needs to be obtained by repeated experiments according to the actual application scenario. In this book, the author conducts experiments on the MovieLens dataset, and he takes the classification number F=100, α=0.02, λ=0.01.
               [Note]: The above four formulas are missing in the book.


To sum up, the implementation of LFM requires:

  1. Initialize the P and Q matrices according to the data set (this is what I have not understood for the time being, how this initialization process is carried out, and I urge you to enlighten me.)
  2. Four parameters are determined: the number of classifications F, the number of iterations N, the learning rate α, and the regularization parameter λ.

The pseudocode of LFM can be expressed as follows:

 

[python]  view plain copy
 
 
  1. def LFM(user_items, F, N, alpha, lambda):  
  2.     #Initialize the P,Q matrix  
  3.     [P, Q] = InitModel(user_items, F)  
  4.     #开始迭代  
  5.     For step in range(0, N):  
  6.         #从数据集中依次取出user以及该user喜欢的iterms集  
  7.         for user, items in user_item.iterms():  
  8.             #随机抽样,为user抽取与items数量相当的负样本,并将正负样本合并,用于优化计算  
  9.             samples = RandSelectNegativeSamples(items)  
  10.             #依次获取item和user对该item的兴趣度  
  11.             for item, rui in samples.items():  
  12.                 #根据当前参数计算误差  
  13.                 eui = eui - Predict(user, item)  
  14.                 #优化参数  
  15.                 for f in range(0, F):  
  16.                     P[user][f] += alpha * (eui * Q[f][item] - lambda * P[user][f])  
  17.                     Q[f][item] += alpha * (eui * P[user][f] - lambda * Q[f][item])  
  18.         # After each iteration, reduce the learning rate. At the beginning, it drops rapidly because it is far from the optimal value;  
  19.         #When the optimization reaches a certain level, you need to slow down the learning rate and slowly approach the optimal value.  
  20.         alpha *= 0.9  


I have added a comment to the pseudocode in the book, please correct me if I am wrong.

 


After estimating the P and Q matrices, we can use the formula (*) to calculate the interest degree value of user U for each item, and recommend the N iterms with the highest interest degree value (ie TOP N) to the user.

To sum up, LFM has a mature theoretical basis. It is a pure-bred learning algorithm that optimizes the specified parameters through optimization theory and establishes the optimal model.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326416129&siteId=291194637