Recommendation algorithm: Based on the field of collaborative filtering

Outline

  1. user-based cf
  2. item-based cf
  3. item_based coding practices
  4. user_based vs. item_based

1.user_based cf

Collaborative filtering algorithm based on the idea of ​​user is: find users with similar interests with the user, then the user will be interested in these similar merchandise recommended to the user. Thus, user-based collaborative filtering algorithm can be divided into two steps:

  • Look for similar user set and the target user interest
    is generally based on user ratings for commodities, calculated Jaccard or cosine similarity to measure the similarity between the interest of the user.
    Jaccard similarity:

    cosine similarity:
  • Find this collection of user-friendly, and the target user never heard of the items recommended to the target user.

1.1 user similarity improvements

Cosine similarity to measure the similarity of interests between users too rough, because the two had taken the same user behavior on popular items and can not explain their interest in similar, but less popular items only been to the same behavior better able to explain their interests similarity.

2.item_based cf

Direct find anything interesting target user, and then these similar products product recommendation to target users based on collaborative filtering algorithm article. Items also be measured by the Jaccard similarity or cosine similarity, not repeat them here. Therefore, based on the recommended items of two steps:

  • Calculating a similarity between items
  • Generates a recommendation list based on historical similarity with the user's line items

3. item_based coding practices

Code in some places replaced with numpy dict, can reduce the memory, but still there is a problem, because of the similarity relationship between the item may be sparse, but numpy is intensive, so there is wasted space.
Due to user-based and item-based principle the same, so only below shows the code of the item-based practice.

item-based cf

# -*- coding:utf-8 -*-
import pandas as pd
import numpy as np
import math
import json


class ItemCF(object):
    def __init__(self, fname):
        self.fname = fname
        self._read_data1(fname)
        
    def _read_data(self, fname):
        self.item_users = {}
        with open(fname, "r") as fr:
            for line in fr:
                fields = line.strip().split(",")
                device_uid = fields[1]
                resblock_id = fields[2]
                cnt = fields[3]
                self.item_users.setdefault(resblock_id, {})
                self.item_users[resblock_id][device_uid] = cnt
        # 对物品编号
        all_items = self.item_users.keys()
        self.item_size = len(all_items)
        self.item_vocab = dict([(item, ind) for ind, item in enumerate(all_items)])
        self.item_reverse_vocab = np.array(all_items)
    
    def similarity(self):
        self.item_popularity = np.zeros(self.item_size)
        # 计算浏览物品的用户数
        for item, user_info in self.item_users.items():
            _item_index = self.item_vocab[item]
            self.item_popularity[_item_index] = len(user_info)
        # 计算物品相似度
        self.item_similarity = np.zeros((self.item_size, self.item_size))
        for i in range(self.item_size-1):
            _former_item = self.item_reverse_vocab[i]
            _former_item_popularity = self.item_popularity[i]
            _former_item_users = self.item_users[_former_item].keys()
            for j in range(i+1, self.item_size):
                _latter_item = self.item_reverse_vocab[j]
                _latter_item_popularity = self.item_popularity[j]
                _latter_item_users = self.item_users[_latter_item].keys()
                common_size = len(set(_former_item_users) & set(_latter_item_users))
                sim = float(common_size) / math.sqrt(_former_item_popularity * _latter_item_popularity)
                self.item_similarity[i][j] = sim
                self.item_similarity[j][i] = sim
                
    def save_model(self, vocab_reverse_fname, sim_fname):
        np.save(vocab_reverse_fname, self.item_reverse_vocab)
        np.save(sim_fname, self.item_similarity)

4.user_based vs. item_based

Comparison items user_based item_based
performance Suitable for a smaller number of users scenes It applies to the number of items a user is significantly smaller than the number of scenes
personalise Timeliness strong, user personalization less obvious areas of interest Nagao items rich, strong user demand for personalized areas
real-time New user's behavior does not necessarily lead to an immediate change in recommendation results New user's behavior will lead to the results of real-time changes recommended

references

Reproduced in: https: //www.jianshu.com/p/9501d377c2a1

Guess you like

Origin blog.csdn.net/weixin_33804582/article/details/91086265