Collaborative filtering Collaborative Filtering

Collaborative filtering collaborative filtering

In human clustering, in groups was

Similarity

1. Jaccard similarity

It is defined as the ratio of the cross and two sets of:

Jaccard distance, defined as 1 - J (A, B), two sets of discrimination measure:

Why Jaccard is not suitable for collaborative filtering? - consider only the user has not read, did not consider the size scores

2. cosine similarity

The cosine of the angle between the two vectors used to measure the similarity:

Why cosine similarity is not suitable for collaborative filtering? - the sum of scores of different users is not the same, resulting in the same total score than not, the opposite may be calculated and the result of the fact.

3. Pearson similarity

Resolve discrepancies similarity in cosine similarity, also known as the center of the cosine algorithm. To center, re-calculate the cosine similarity, and positive values indicate a positive correlation, a negative value indicates a negative correlation.

Collaborative filtering based on the user

To measure and score the extent favorite items by the user. Recommended for the same commodity or commodities based on the contents of the attitude of different users.

Illustration, each row vector indicates that a user rating of all movies

Of the first data center

A user then calculated Pearson correlation coefficient and other user:

It can be found in user A and user B preference close, so B can be like but never seen A recommendation to the chamber A, but also can be A, but B likes not seen Goblet recommend to B.

Users law issues exist:

　　1. Data sparsity. Too many items, buy items overlap between different users are less, making it impossible to find a similar user preferences

　　2. The algorithm scalability. Nearest neighbor algorithm calculates the amount of users increases and the number of items is increased, is not suitable for the case of using a large amount of data.

Collaborative filtering items based on

By calculating the scores for different users different items, obtain the relationship between objects. The relationship between the items recommended by the user based on similar items.

Illustration, each represented by a row vector of each article is a user rating, the first center of

How to predict the degree of preference of the user E Harry Potter? Pearson correlation coefficient between the calculated and the other Harry Potter movies