Each matrix factorization model recommendation system 1

Each matrix recommendation system decomposition model

1. SVD

Of course mentioned matrix decomposition, was first thought that the SVD classical mathematics (singular value) decomposition, directly on the formula:
$$
of M_ {m \ Times U_upper {n-m} = \ K} Times \ Sigma_ {K \ K Times V_ {K} \} ^ {n-Times T}
$$

  • Principles and processes
  • Of course, the form of the SVD matrix multiplication 3
  • About two matrices represent user / project matrix implicit factor
  • Intermediate matrix singular value matrix and is a diagonal matrix, each element satisfy the non-negative, and gradually decreases
  • purpose
  • Recommend a good
  • Calculated three matrices
  • About two matrices represent user / project matrix implicit factor
  • Intermediate matrix singular value matrix and is a diagonal matrix, each element satisfy the non-negative, and gradually decreases
  • advantage
  • In many cases, even 1% of the top 10% of singular value and accounted for more than 99% of all the singular values ​​of the ratio of the sum.
  • That is, we can use the k largest singular values ​​and the corresponding left and right singular vector matrix described approximated
  • Shortcoming
  • If you want to use the words of the SVD, provided that there is a requirement matrix is ​​dense, namely the matrix, the elements to be non-empty, or can not use SVD decomposition. It is clear that our task is not yet able SVD, so the general practice is to use the mean or other statistical methods to populate the matrix, and then using the SVD dimensionality reduction.
  • Requirements dense matrix
  • If the user and a large amount of data items, computationally intensive, time consuming
  • Cold start is not resolved

2. FunkSVD

Just mentioned SVD matrix need to be filled first, then decomposition dimensionality reduction, and because the inverse operation request (complexity of O (n ^ 3)), there is a problem of high computational complexity, it is then presented FunkSVD of Simon Funk method, it is not the matrix into three matrices, but decomposed into two low-rank user item matrix, while reducing the computational complexity, the following formula:
$$
\ min {^ {Q }, P ^ { }} \ SUM {(U, I) \ in \ mathcal {K}} \ left (UI R_ {} {I} -q_ ^ P_ {T} {U} \ right) ^ {2}
$$
it draws linear regression thought to finding the optimal user and project by minimizing the square implied observed data vector representation. Meanwhile, in order to avoid over-fitting (overfitting) observations, also proposed FunkSVD with L2 Regularization:
$$
\ min {Q: P \ CDOT} \ SUM {(U, I) \ in \ Kappa} \ left ( r_ {ui} -q_ {i} ^ {\ tau} p_ {u} \ right) ^ {2} + \ lambda \ left (\ left | q_ {i} \ right | ^ {2} + \ left | p_ } U {\ right | ^ {2} \ right)
$$
these two functions can be optimized by gradient or decreased stochastic gradient descent method for finding the optimal solution.

2.1 FunkSVD Summary

  • purpose
  • Recommend a good
  • Low matrix into two-dimensional matrix
  • parameter
  • Without processing
  • Tuple
  • advantage
  • Simple idea
  • Apply simple
  • Expansion of the model is also very strong, it can be applied to various scenes
  • Reduced computational complexity compared to SVD
  • Shortcoming
  • Cold start is not a good solution
  • Sparse data

3. PMF

From the paper: Salakhutdinov et al Probabilistic matrix factorization NIPS (2008):.. 1257-1264.

PMF explain the probability FunkSVD version, it is assumed that the scoring matrix elements in the $ R_ij $ is the inner product determined potential user preference vector and $ U_i $ article potential property vector $ v_J $ a, and with mean $ \ mathbf { } the U- {I} {T} ^ \ mathbf {V} {J} $, the variance sub-normal $ \ sigma ^ {2} $ cloth:
$$
\ R & lt mathbf {} {I, J} \ SIM \ {N} mathbf \ left (the U- {I} {T} ^ V_ {J}, \ Sigma ^ {2} \ right)
$$
scoring matrix of the conditional probability is observed:
$$
P \ left (\ mathbf { R} | U, V, \ sigma ^ {2} \ right) \ sim \ prod_ {i = 1} ^ {N} \ prod_ {j = 1} ^ {M} \ mathrm {N} \ left (U_ { T ^ {}} I V_ {J}, \ Sigma ^ {2} \ right) of I_ ^ {} {} ij of
$$
Meanwhile, assuming the user preference vector and the preference vector items are subject to mean 0 and variance are $ \ sigma_ {U} ^ {2 } \ mathbf {I}, \ sigma_ {V} ^ {2} \ mathbf {I} $ normal distribution:
$$
\ begin {aligned} p \ left (U | \ sigma_ {U} ^ {2} \ right) & = \ prod_ {i = 1} ^ {N} \ mathcal {N} \ left (U_ {i} | 0 , \ sigma_ {U} ^ { 2} \ mathbf {I} \ right) \ p \ left (V | \ sigma_ {V} ^ {2} \ right) & = \ prod_ {j = 1} ^ {M} \ mathcal {N} \ left (V_ {J} | 0, \ sigma_ {V} 2 ^ {} \ mathbf the I {} \ right) \ the aligned End {}
$$
Bayes' formula, the latent variables can be derived U, V posterior probability is:
$$
\begin{array}{l}{p\left(U, V | R, \sigma^{2}, \sigma_{V}^{2}, \sigma_{U}^{2}\right)=\frac{p\left(U, V, R, \sigma^{2}, \sigma_{V}^{2}, \sigma_{U}^{2}\right)}{p\left(R, \sigma^{2}, \sigma_{V}^{2}, \sigma_{U}^{2}\right)}=\frac{p\left(R | U, V, \sigma^{2}\right) \times p\left(U, V | \rho_{V}^{2}, \sigma_{U}^{2}\right)}{p\left(R, \sigma^{2}, \sigma_{V}^{2}, \sigma_{U}^{2}\right)}} \ {\sim p\left(R | U, V, \sigma^{2}\right) \times p\left(U, V | \sigma_{V}^{2}, \sigma_{U}^{2}\right)} \ {=p\left(R | U, V, \sigma^{2}\right) \times p\left(U | \sigma_{U}^{2}\right) \times p\left(V | \sigma_{V}^{2}\right)} \ {=\prod_{i=1}^{N} \prod_{j=1}^{M}\left[N\left(R_{i j} | U_{i}^{T} V_{j}, \sigma^{2}\right)\right]^{I_{i j}} \times \prod_{i=1}^{N}\left[N\left(U_{i} | 0, \sigma_{U}^{2} I\right)\right] \times \prod_{j=1}^{M}\left[N\left(V_{j} | 0, \sigma_{V}^{2} I\right)\right]}\end{array}
$$
Subsequently, both sides of the equation taking logarithm LN $ $ obtained:
$$
\ Array the begin {} {} {L \ LN P \ left (the U-, V | R & lt, \ Sigma ^ {2}, \ sigma_ {V } ^ {2}, \ sigma_ {U} ^ {2} \ right) = - \ frac {1} {2 \ sigma ^ {2}} \ sum_ {i = 1} ^ {N} \ sum_ {j = 1} ^ {M} I_ { ij} \ left (R_ {ij} -U_ {i} ^ {T} V_ {j} \ right) ^ {2} - \ frac {1} {2 \ sigma_ {U} ^ {2}} \ sum_ { i = 1} ^ {N} U_ {i} ^ {T} U_ {i} - \ frac {1} {2 \ sigma_ {V} ^ {2}} \ sum_ {j = 1} ^ {M} V_ {j} ^ {T} V_ {j}} \ {- \ frac {1} {2} \ left (\ left (\ sum_ {i = 1} ^ {N} \ sum_ {j = 1} ^ {M } I_ {ij} \ right) \ ln \ sigma ^ {2} + ND \ ln \ sigma_ {U} ^ {2} + M \ operatorname {Dln} \ sigma_ {V} ^ 2} {\ right) + C} \ {End} Array
$$
Finally, after deduction, we can see that indeed the probability interpretation PMF version of FunkSVD, much the same as its two forms.

NOTE: In order to facilitate understanding, in this example the intermediate key derivation $$
N \ left (U_upper {I} | 0, \ Sigma ^ {2} \ mathbf the I {} \ right)
$$, this will expand, into multidimensional normal distribution can be obtained $$
- \ FRAC} {2} {D \ LN \ left (\ {sigma_ the U-2} ^ {} \ right) - \ I} {U_upper FRAC {^ {T} {I U_upper {2}} \ {sigma_ the U-2}}} ^ {+ C
$$. It is derived as follows:

$$
\begin{aligned} & N\left(U_{i} | 0, \sigma_{U}^{2} I\right)=-\frac{1}{(2 \pi)^{D / 2}\left|\sigma_{U}^{2} I\right|^{1 / 2}} e^{-\frac{1}{2} \frac{1}{2} w_{i}^{T}\left(\sigma_{U}^{2} I\right)^{-1} U_{i}} \ \ln N\left(U_{i} | 0, \sigma_{U}^{2} I\right) &=\ln \left(-\frac{1}{(2 \pi)^{D / 2}\left|\sigma_{U}^{2} I\right|^{1 / 2}}\right)-\frac{U_{i}^{T} U_{i}}{2 \sigma_{U}^{2}} \=&-\ln \left(\left|\sigma_{U}^{2} I\right|^{1 / 2}\right)-\frac{U_{i}^{T} U_{i}}{2 \sigma_{U}^{2}}+C \=&-\frac{1}{2} \ln \left(\sigma_{U}^{2 D}\right)-\frac{U_{i}^{T} U_{i}}{2 \sigma_{U}^{2}}+C \=&-\frac{D}{2} \ln \left(\sigma_{U}^{2}\right)-\frac{U_{i}^{T} U_{i}}{2 \sigma_{U}^{2}}+C \end{aligned}
$$

3.1 PMF Summary

  • purpose
  • Good scores
  • parameter
  • Without processing
  • Tuple
  • advantage
  • Simple idea
  • Apply simple
  • Expansion of the model is also very strong, it can be applied to various scenes
  • Reduced computational complexity compared to SVD
  • PMF for the interpretation of probability FunkSVD version
  • Shortcoming
  • Cold start is not a good solution
  • Sparse data

4. BiasSVD

From the paper:.. Al Koren et Techniques for the Matrix factorization Recommender Systems Computer 42.8 (2009).

After FunkSVD raised, have also proposed a number of deformed version, which is relatively popular method is BiasSVD, which is based on the assumption that: Some users will own some of the characteristics, such as naturally willing to give others praise, leniency is better to speak, Some people are more demanding, not always score more than 3 points (out of 5 points); while there are some of these projects, a production will be determined by its position, while others are more welcomed by the people, and some were people despise this is precisely the reason put forward bias user and project items; explanandum bright given is: nothing to do for some intrinsic properties and user objects a scoring system, and users irrespective of some property and goods, items are also some attributes with the user irrelevant, the following specific prediction formula:
$$
\ R & lt Hat {} {} = ij of \ MU + B {U} + {I} B_ Q_ {I} + {T} ^ P_ {U}
$$
where $ \ mu $ is the average score of the entire site, the tone is really a website; $ b_u $ biased to score users, representing a user's score keynote, $ b_i $ score for the project is biased on behalf of a project property tone.

4.1 BiasSVD Summary

  • purpose:
  • Good scores
  • parameter:
  • Tuple or no treatment
  • Suppose BiasSVD scoring system includes a biasing factor of three parts:
  • Some factors and user ratings unrelated items, there are some user rating factors and items unrelated, called user bias term.
  • The article also has some scoring factors and user-independent, called bias term items
  • advantage:
  • BiasSVD added some additional factors to consider, so in some scenes will show better than FunkSVD.
  • Electricity supplier or a movie scene may be more suitable platform
  • Disadvantages:
  • Cold start is not a good solution
  • Sparse data

5. SVD ++

来自论文:Koren Y. Factor in the neighbors: Scalable and accurate collaborative filtering[J]. ACM Transactions on Knowledge Discovery from Data (TKDD), 2010, 4(1): 1.

In addition to user preference modeling explicitly score, implicit feedback information is also helpful to users, and therefore subsequently proposed SVD ++. It is based on the assumption: the user in addition to the outer explicit rating history project records, browsing history or favorites lists, an implicit feedback information can also reflect the user's preferences from the side to a certain extent, such as the user of a project carried out the collection, may be reflected from the side of his interest for this project, is particularly reflected in the prediction formula:
$$
\ R & lt Hat {} {} = ij of \ MU + B {U} + {I} + B_ Q_ ^ {} T {I} \ left ({U} + P_ | N (I) | ^ {- \ FRAC. 1} {2} {} \ sum_ {S \ N in (I) Y_ {S}} \ right)
$$
where $ N ( i) $ implicit feedback acts as a user $ i $ produced collection of items; $ y_s $ is hidden for the project of $ s $ personal preference bias is a parameter we want to learn; as $ | N (i) | ^ {- \ frac {1} {2}} $ is an empirical formula.

5.1 SVD ++ Summary

  • purpose:

  • Recommend a good

  • parameter:

  • User rating
  • The user clicks

  • advantage:

  • SVD ++ added some additional factors to consider, such as: user behavior not only score, and some implicit feedback (clicks, etc.).
  • Video platforms, articles and other media platforms are suitable, but also for electronic business platform

  • Disadvantages:

  • Cold start is not a good solution

  • Sparse data

6. timeSVD

From paper: Koren et Al Collaborative Filtering with temporal Dynamics.. Communications of the ACM The 53.4 (2010): 89-97.

It is based on the assumption: the user's interests or preferences are not static, but dynamic evolution over time. Thus proposed timeSVD, wherein the user of the article and offset varies with time, while the user is also implicit factor of dynamically changing over time, the article indicates this is not implied varies with time (assuming the article property does not change over time).
$$
\ R & lt Hat {} {} UI = \ MU + B {I} \ left (UI T_ {} \ right) B_ {U} + \ left (UI T_ {} \ right) + Q_ ^ {} {I P_ {U}} T \ left (UI T_ {} \ right)
$$
wherein, $ I $ is the time factor indicating the state of a different time.

6.1 timeSVD Summary

  • purpose:
  • Recommend a good
  • parameter:
  • User rating
  • Time Ratings
  • advantage:
  • timeSVD ++ added some additional factors to consider, it is assumed that the user is interested in change with the flow of time
  • Disadvantages:
  • Cold start is not a good solution
  • Sparse data

7. NMF

来自论文:Lee et al. Learning the parts of objects by non-negative matrix factorization. Nature 401.6755 (1999): 788.

This is a classic paper published in Nature, show Google Scholar references nearly 9k, it presents a hypothesis: the decomposition of the small non-negative matrix should satisfy the constraints.

Since in most processes, the original matrix $ \ mathbf {R} $ is approximately decomposed into two low-rank matrix phase $$
\ R & lt mathbf} = {\ mathbf P {T} ^ {} \ mathbf {Q}
$$ take common form of these methods is that even if the elements of the original matrix are non-negative, there is no guarantee the decomposition of small matrix are non-negative, which led to the recommendation system in the classic matrix factorization method can achieve very good predictor of performance, but you can not make a recommendation as User-based CF explain that in line with people's habits (ie with tastes similar to yours who also bought this product). In the mathematical sense, the decomposition result is positive or negative does not matter, as long as the matrix elements of the reduction of non-negative and the error can be as small as possible, but the negative elements in the real world is often no sense. Such as image data can not exist in a pixel value is negative, since the value between 0 to 255; when the document frequency statistics, the negative is not explained. Therefore proposed with a non-negative matrix factorization is bound to the traditional matrix factorization can not attempt a scientific explanation to make. Its formula is as follows:
$$
\ Array the begin {C} {} {\ R & lt mathbf {} \ approx \ mathbf P {T} ^ {} \ mathbf {Q}} \ {\ text ST} {\ mathbf {P} \ GEQ 0} \ {\ mathbf {Q} \ GEQ 0} \} End {Array
$$
wherein, $ \ mathbf {P} $ , $ \ mathbf {Q} $ two matrix elements satisfy the non-negative constraint.

7.1 NMF Summary

  • purpose:
  • Good scores
  • parameter:
  • Tuple without processing
  • advantage:
  • The decomposition of the matrix, the elements are non-negative
  • Multi-application scenarios, as follows:
  • NMF can be used to find the database image features, for fast automatic identification
  • To discover semantic relevance of the document, for automatic indexing and retrieval of information
  • Genes can be identified in the DNA array analysis, etc.
  • The most effective way to image processing, NMF is an effective method for processing image data of feature extraction and dimension reduction.
  • Disadvantages:
  • Cold start is not a good solution
  • Sparse data

8. WMF

来自论文:Pan et al. One-class collaborative filtering. ICDM, 2008.
Hu et al. Collaborative filtering for implicit feedback datasets. ICDM, 2008.

For matrix decomposition, we generally recommend treatment system in the score prediction task, but the same matrix decomposition can also be used to make recommendations Top-N, that is to predict whether the user clicks on an item based on implicit information. You can think of him as a binary classification problem, that is not the point or points. But this is not an ordinary binary classification problem, because in the process of training the model in negative samples are not true negative samples, may be a user never seen this project, how can like it or not, chances are he likes after seeing it, that tells us the positive samples of like information but negative sample does not tell us that the user does not like. Because there is only positive samples, so the problem definition we only positive feedback for one-class problem, that is a single class of problems. For a single type of problem, the authors propose two solutions strategy, one is weighted matrix decomposition, and the other is negative sampling technique. Although only add a little weight, it looks naive, but that at the time of the research background, this small step is actually a big step recommendation system.
$$
\ mathcal {L} (\ X-boldsymbol {}) = \ sum_ {ij of W_} {} ij of \ left ({ij of R_ {ij of -X_}} \ right) ^ {2}
$$
for a single kind of problem Research has not stopped, although the negative sampling technique is heuristic, that is not used by data modeling to predict the way, but the effect is very easy to use. In recent years there has been proposed such a single kind of problem to deal with model-based method, i.e., modeled from the missing data, can be found in two papers specifically [Hernández-Lobato et al 2014, Liang et al 2016 ].

8.1 WMF Summary

  • purpose:
  • Recommend a good
  • parameter:
  • User ratings, clicks, comments and other operations that are converted to 0 or 1 Data
  • The samples were also negative samples
  • advantage:
  • In line with recommendation TopN
  • User data using implicit
  • Disadvantages:
  • Cold start is not a good solution
  • Sparse data

9. LLORMA

Thesis from:.. Lee et Al approximation the Local Matrix Low-Rank ICML , 2013..

Classical matrix decomposition model is assumed that the entire user - item matrix (i.e., matrix UI) is assumed to meet the low-rank (i.e., low-rank global hypothesis), i.e., in the entire system, the user always satisfied, and project a pattern similar to the present, i.e. attracts like , people divide into groups.

This assumption is certainly reasonable, but in today's era of big data, assuming that rank low on the global significance seems too strong, especially in the case of the huge amount of data (ie, the number of users and the number of items which are a lot of systems), the paper therefore overturned the classic sense of the global low rank on a global hypothesis, it considers the wider world, there are numerous, we should go look for local low-rank hypothesis (ie, local low rank hypothesis). First, according to some similarity measure to the entire array is divided into a large number of small matrices, each of which satisfies a certain small matrix similarity threshold, then the assumptions made in the local low rank among submatrices. Thus, the overall weighted combination of large matrix may be composed of a plurality of small local matrix, reference may be made to the paper.

Here Insert Picture Description

9.1 LLORMA Summary

  • purpose:
  • Recommend a good
  • parameter:
  • Input: scoring matrix $ M \ in {{\ mathbb {R}} ^ {\ text {m} !! \ times !! \ text {} n}} $
  • The number of local model Q , rank the partial matrix R & lt , learning rate v , the regularization parameter [lambda] .
  • advantage:
  • LLORMA ensure the computational complexity of the algorithm on the computing efficiency
  • And LLORMA algorithm can be applied to large-scale data recommendation scene.
  • Disadvantages:
  • I do not know temporarily

10. SRui

From the paper:. Ma Hao An experimental study on implicit social recommendation SIGIR, 2013..

Although classical matrix factorization can already reach better predict performance, but it is still not immune to the inherent shortcomings of that data sparse and cold-start problem. In order to alleviate the data sparseness we can introduce a wealth of social information. That is, if two users is a friend, so we assume that they have the same preferences, while they learn the implicit representation of the user should have a similar distance in the vector space. User Dimensions so, empathy, project dimensions can also use this idea to project constraints implicit representation. That is, if the relationship between the two projects close, the distance in low-dimensional vector space also should be small. The relationship here is extracted from the UI matrix, the paper has become a social relationship implicit item (in fact nothing to do with the social dimension of the project). The formulas are:
$$
\ the aligned the begin {L} = & \ min {the U-, V} \ FRAC. 1} {2} {\ SUM {I = m. 1} ^ {} \ sum_. 1} = {J ^ {n- } of I_ {ij of} \ left (R_ {ij of} - \ mathbf {U} {I} ^ {T} \ mathbf {V} {J} \ right) ^ {2} \ & + \ FRAC {\ Alpha} { 2} \ sum_ {I =. 1} ^ {m} \ sum_ {F \ in \ mathcal {F.} ^ {+} (I)} S_ {IF} \ left | \ mathbf {U} {I} - \ mathbf {U} {F} \ right | {F.} ^ {2} \ & + \ FRAC {\ Beta} {2} \ SUM {J =. 1} ^ {n-} \ sum_ {Q \ in \ mathbb {Q} ^ {+} (J)} S_ {JQ} \ left | \ mathbf {V} {J} - \ mathbf {V} {Q} \ right | {F.} ^ {2} \ & + \ FRAC {\ the lambda {1}} {2} | U |^ F. {2} {+} \ {FRAC \ the lambda {2} {2}} | V | _ ^ {2} {} F. \ The aligned End {}
$$
wherein, $ s_if $ $ I $ indicates that the user and the user the $ f $ social similarity, $ s_jq $ express project $ j $ and $ q $ project implicit social similarity, in user dimension and the dimension of the project increased by smoothing term constraints such hidden features to learn more realistic representation significance.

10.1 SRui Summary

  • purpose:
  • Recommend a good
  • parameter:
  • Scoring matrix $ M \ in {{\ mathbb {R}} ^ {\ text {m} !! \ times !! \ text {} n}} $
  • advantage:
  • Optimized for scenes of interaction between users
  • There are also optimized for the projects between
  • Enriches the data, the data sparse alleviate
  • Disadvantages:
  • Sparse data
  • Cold start problem

11. ConvMF

Thesis from:. Kim et Al Convolutional Document Matrix factorization for context-Aware Recommendation. RecSys 2016.

Of course one of the advantages of matrix decomposition is scalability, which of course is not blowing, such as 16 years of this article is to matrix factorization (MF) and image processing is the fire convolutional neural network (CNN) made a perfect combined.

As a method of collaborative filtering matrix decomposition classic model, of course, not to say the performance. But it exists and cold start sparse data problem has always been its pain points, so the combination of external wealth of information has become an effective way to alleviate these problems. Wherein the text data as a web in the mainstream has become the preferred form of data, and for processing text, most of them are based on one-hot representation, it can not capture the key information document in context, so do the combination of the two, specific For details, see the paper, the formula is as follows:
$$
\ Array the begin {C} {} {\ mathcal {L} (the U-, V, W is) = \ sum_ {I} ^ {N} \ sum_ {J} ^ {M } \ {ij of FRAC of I_ {{2}}} \ left (R_ {ij of -u_} {} T {I} ^ V_ {J} \ right) {2} + \ FRAC {\ the lambda {} {} the U-2 } \ sum_ {I} ^ {N} \ left | U_ {I} \ right | {2}} \ {\ Quad + \ FRAC {\ the lambda {V}} {2} \ sum_ {J} ^ {M} \ left | V_ {J} - \ OperatorName {CNN} \ left (W is, X_ {J} \ right) \ right | {2} + \ FRAC {\ the lambda {W is}} {2} \ sum_ {K} ^ { \ left | {K} W_ \ right |} \ left | {K} W_ \ right | _ {2}} \} End {Array
$$
wherein the implicit vector such that a user with the implicit vector inner product projects possible approximation real score at the same time, the project implicit vector do additional constraints that make the project document with the implicit characteristic vector CNN learned close as possible.

11.1 ConvMF Summary

  • purpose:
  • Good scores
  • parameter:
  • Scoring matrix $ M \ in {{\ mathbb {R}} ^ {\ text {m} !! \ times !! \ text {} n}} $
  • Text data, such as comments, etc.
  • advantage:
  • Rich data set to ease the problem of sparse data
  • In allows users to implicit vector and vector inner product items hidden as much as possible close to the real score at the same time
  • Implicit Vector project done additional constraints that make the project implicit vector with CNN learned document characteristics as close as possible
  • Disadvantages:
  • Cold start problem

12. NCRPD-MF

来自论文:Hu et al. Your neighbors affect your ratings: on geographical neighborhood influence to rating prediction. SIGIR 2014.

Just said, MF scalability is good, on the one hand and mainstream models can be done seamlessly integrated, on the other hand can be a variety of information sources and do the integration features, such as 14 years of this article, it is the integration of text comment information, geographical neighbor information, and type information item popularity, and specific prediction formula is as follows:
$$
\ the aligned the begin {} \ R & lt Hat {} {} & UI = \ MU + B {U} + {I} B_ \ & + \ mathbf {P} {U} ^ {\ Top} \ left (\ FRAC {. 1} {\ left | R & lt {I} \ right |} \ sum_ {W \ in R_ {I}} \ mathbf { Q} {W} + \ FRAC {\ Alpha {. 1}} {\ left | of N_ {I} \ right |} \ sum_ {n-\ in of N_ {I}} \ mathbf {V} {n-} + \ FRAC { \ Alpha {2}} {\ left | C_ {I} \ right |} \ sum_ {C \ in C_ {I}} \ mathbf {C} {D} _ \ right) \ the aligned End {}
$$

Wherein, $ \ mathbf {Q} {W} $ of low dimensional vectors text feature representation, $ \ mathbf {V} {I} $ of low dimensional vectors geographical neighbor representation, $ \ mathbf {d} _ {c} $ expressed as a low-dimensional feature project category.

12.1 NCRPD-MF Summary

  • purpose:
  • Recommend a good
  • parameter:
  • Scoring matrix $ M \ in {{\ mathbb {R}} ^ {\ text {m} !! \ times !! \ text {} n}} $
  • Text data, such as comments, etc.
  • Geographic Data
  • advantage:
  • Rich data set to ease the problem of sparse data
  • For more public interaction, review platform. Such as: public comment
  • These platforms have a wealth of text and geographic information
  • Disadvantages:
  • I do not know temporarily

Guess you like

Origin www.cnblogs.com/hwangzhic/p/11576007.html