Mahout: distributed item-based algorithm 1

  • co-occurrence matrix

Instead of computing the similarity between every pair of items, it’ll compute the number of times each pair of items occurs together in some user’s list of preferences, in order to fill out the matrix.

Co-occurrence is like similarity; the more two items turn up together, the more related or similar they probably are. The co-occurrence matrix plays a role like that of ItemSimilarity in the nondistributed item-based algorithm.

  • user vectors

Likewise, in a data model with n items, user preferences are like a vector over n dimensions, with one dimension for each item. The user’s preference values for items are the values in the vector. Items that the user expresses no preference for map to a 0 value in the vector. Such a vector is typically quite sparse, and mostly zeroes, because users typically express a preference for only a small subset of all items.

  • Producing the recommendations

The product of the co-occurrence matrix and a user vector is itself a vector whose dimension is equal to the number of items. The values in this resulting vector, R, lead us directly to recommendations: the highest values in R correspond to the best recommendations.


That third row contains co-occurrences between item 103 and all other items. Intuitively, if item 103 co-occurs with many items that user 3 expresses a preference for, then it’s probably something that user 3 would like.
 

References

http://en.wikipedia.org/wiki/Matrix_multiplication

http://haselgrove.id.au/wikipedia.htm

猜你喜欢

转载自ylzhj02.iteye.com/blog/2059507