Recommended system study

The current recommendation system is divided into three categories:

1. Non-personalized recommendation system 

Features: based on statistical analysis techniques, product sales ranking, so that all users see the recommended information are the same, or Editor, and based on the average numerical score.

2. Semi personalized recommendation system

Features: Recommended produce results based on the user's current browsing behavior or the user's current shopping cart information 

3. Fully personalized recommendation system

Features: based on user history information, combined with the current behavior of the user, the user generated entirely personalized recommendation service

 

Do recommend the system, input information is divided into several types:

1) Enter private browsing; 2) Enter the display browser;.. 3) Keyword / product attributes input; 4) User rating input; 5) evaluation of user text input; 6) Editor's input; 7) user. enter the purchase history

 

Output form is expressed as: 

. A) related products output; b) Evaluation of an individual text output;. C) individual output rates;. D) Average value rating output; E) Email output f) Editor output.

E-commerce recommendation algorithm include:

1. Memory recommendation algorithm:

User_based collaborative filtering, Item_based collaborative filtering recommendation based on collaborative filtering recommendation Horting FIG technique,

2. The model is based on the recommendation algorithm:

Cluster_based collaborative filtering, collaborative filtering recommendation based on dimensionality reduction is recommended based on the recommended association rules Bayesian network technology

Insufficient RAM recommended algorithm: 

When the user when the database is very large, it is difficult to ensure real-time

Recommendation algorithm based on inadequate model:

Relative to the original model in terms of user data having a hysteresis effect, to ensure the validity of the model, the model needs to be updated periodically 

The following describes the algorithm for several typical

1.User_based collaborative filtering

Based on the assumption: If a user rating of some of the items is quite similar, they are also similar to other items of the score.

User_based collaborative filtering processing is divided into three stages:

Data are expressed ------> nearest neighbor queries (user similarity measure) ------> recommendation generation

Similarity measure of the user's own method:

1) cosine similarity   

Provided the user  i and user  j  rating items in n-dimensional space are represented as a vector  \ Begin {i},\ Begin {j}

The user  i and the user  jsimilarity between

  Sim (i, j) = cos (\ begin {i}, \ begin {j}) = \ frac {\ begin {i} \ times \ begin {j}} {| \ begin {i} | \ times | \ vec {j} |} 

Molecule is inner product of two vectors Rating, the denominator is the user mode vector product

2) Related similarity

Users  i and user  jcommon score ever by a collection of items I_{ij} represent

Measure with pearson correlation coefficient

sim(i,j)={\sum_{c\in{I_{ij}}}\time{(R_{i,c}-\bar{R_i})}{(R_{j,c}-\bar{R_j})}} /{\surd{\sum_{c\in{I_{ij}}}{(R_{i,c}-\bar{R_i})}^2}\surd{\sum_{c\in{I_{ij}}}{(R_{j,c}-\bar{R_j})}^2}}

R_{i,c}It indicates that the user  i evaluation of the item c, \bar{R_i}and \ Bar {} R_jrepresent the user  i and the user  javerage rating.

3) the modified cosine similarity 

We do not consider the different user ratings scale problems in the cosine similarity measure, a modified cosine similarity measure to improve the above defects by subtracting the average user rating for the item:

Set user  i and user  j  together with scores over the items in the collection I_{ij}represent, I_{i}and  I_{j} indicate to the user  i and the user  jhas rated collection of items.

sim(i,j)={\sum_{c\in{I_{ij}}}\time{(R_{i,c}-\bar{R_i})}{(R_{j,c}-\bar{R_j})}} /{\surd{\sum_{c\in{I_{ij}}}{(R_{i,c}-\bar{R_i})}^2}\surd{\sum_{c\in{I_{ij}}}{(R_{j,c}-\bar{R_j})}^2}}

R_{i,c} It indicates that the user   irate of the item c, \bar{R_i} and  \ Bar {} R_jrepresent the user  i and the user  j average rating.

 

Recommended produced:

P_{u,i}=\bar{R_u}+{\sum_{n\in{NN_u}}sim(u,n)\times({R_{n,i}-\bar{R_n}})/{\sum_{n\in{NN_u}}\mid{sim(u,n)}\mid{}}

Set user in 's nearest neighbor collection by NN_ {and}said user  in for entry  i of the prediction score P_{u,i} by the user  in  to the nearest neighbor set of NN_ {and} items in the score obtained.

yes (u, n)Indicates that the user  inand the user  n similarity between, R_{n,i} indicating the user  n -to-item  i  score,  \bar{R_u} and  \ Bar {R_n} represent the user  inwith a user  n  average rating item.

User rating forecast for all unrated items by the above method, and then select the prediction score the highest number of entries before the result of feedback as a recommendation to the current user.

The use of transaction data as user input can not be predicted score, then use the following method:

1) The most frequent term recommendation, to buy their goods were counted using the current user every neighbor recently purchased a record, select the recommended frequency of purchase is not high

2) Association Rules Recommendation

 

Item_based collaborative filtering algorithm

Based on the assumption: If most user ratings for some items of similar, relatively similar to the current user of these items score.

Implementation phase: a measure of the similarity between items

1. Core nearest neighbor query 

1). Cosine similarity

Sim (i, j) = cos (\ begin {i}, \ begin {j}) = \ frac {\ begin {i} \ times \ begin {j}} {| \ begin {i} | \ times | \ vec {j} |} 

See item ratings as vectors in m-dimensional space the user, if the user does not score item, then the rating of the user key to 0, set items   i and item  j score on an m-dimensional space of the user are represented as a vector  \ Begin {i},\ Begin {j} 

2) Related similarity

sim(i,j)={\sum_{c\in{U_{ij}}}\time{(R_{c,i}-\bar{R_i})}{(R_{c,j}-\bar{R_j})}} /{\surd{\sum_{c\in{U_{ij}}}{(R_{c,i}-\bar{R_i})}^2}\surd{\sum_{c\in{U_{ij}}}{(R_{c,j}-\bar{R_j})}^2}}

R_{c,i} C represents the user of the item   i scores, \bar{R_i}and \ Bar {} R_jrepresent items  i and items  jthe average rating.

3) the modified cosine similarity

Set of items  i and items  j  common set of users with scores had U_ {ij}said, U_{i}and  U_{j} , respectively, a term  i and term  jrating had a set of users.

sim(i,j)={\sum_{c\in{U_{ij}}}\time{(R_{c,i}-\bar{R_c})}{(R_{c,j}-\bar{R_c})}} /{\surd{\sum_{c\in{U_{ij}}}{(R_{c,i}-\bar{R_c})}^2}\surd{\sum_{c\in{U_{ij}}}{(R_{c,j}-\bar{R_c})}^2}}

R_{c,i} C represents the user of the item   iscores, \bar{R_c} indicating that the user c average rating of items.

 

In the purchase of goods vpurchase of goods under the conditions of the in conditional probability of

P(u|v)=\frac{Freq(uv)}{Freq(v)}

There is a problem, sometimes   in with  v no similarity between, just because  in frequently purchase results. The similarity is very high.

Solution: The User - each row of the product matrix R normalized to the same length.

 

Recommended produced:

Target entry  TI-nearest neighbor set withNhn_ {T} = {\ {Nhn_l, Nhn_2 ... Nhn_k} \}

P_{u,TI}={\bar{R_{TI}}}+ \sum_{n\in{NN_{TI}}}sim(TI,n)\times({R_{u,n}-\bar{R_n}})/\sum_{n\in{NN_{TI}}}(|sim(TI,n)|)

 

Dimension reduction algorithm based on collaborative filtering, disadvantages: the accuracy will decrease

Advantages: solve the problem of data sparsity, reducing the computational overhead.

Cluster_based collaborative filtering algorithm 

The entire space is divided according to user buying habits and characteristics of the user's score into a number of different clusters, so that the internal rating of the user clustering items as similar as possible, and between different clusters user rating for commodities as different as possible.

K-means clustering algorithm using the entire user space clustering main steps:

1) randomly selected as a seed node K users, the data rates of K user items as an initial cluster centers.

2) the remaining set of users, each user and calculating the similarity of the K cluster centers, each user will be assigned to the highest similarity of the clusters.

3) For a new generation of clusters, cluster computing the average rating from all users of the items, generating a new cluster centers

4) Repeat steps 2-3 until the clustering no change occurs.

Guess you like

Origin blog.csdn.net/qq_27575895/article/details/87928179