Outline
- user-based cf
- item-based cf
- item_based coding practices
- user_based vs. item_based
1.user_based cf
Collaborative filtering algorithm based on the idea of user is: find users with similar interests with the user, then the user will be interested in these similar merchandise recommended to the user. Thus, user-based collaborative filtering algorithm can be divided into two steps:
- Look for similar user set and the target user interest
is generally based on user ratings for commodities, calculated Jaccard or cosine similarity to measure the similarity between the interest of the user.
Jaccard similarity:
cosine similarity: - Find this collection of user-friendly, and the target user never heard of the items recommended to the target user.
1.1 user similarity improvements
Cosine similarity to measure the similarity of interests between users too rough, because the two had taken the same user behavior on popular items and can not explain their interest in similar, but less popular items only been to the same behavior better able to explain their interests similarity.
2.item_based cf
Direct find anything interesting target user, and then these similar products product recommendation to target users based on collaborative filtering algorithm article. Items also be measured by the Jaccard similarity or cosine similarity, not repeat them here. Therefore, based on the recommended items of two steps:
- Calculating a similarity between items
- Generates a recommendation list based on historical similarity with the user's line items
3. item_based coding practices
Code in some places replaced with numpy dict, can reduce the memory, but still there is a problem, because of the similarity relationship between the item may be sparse, but numpy is intensive, so there is wasted space.
Due to user-based and item-based principle the same, so only below shows the code of the item-based practice.
item-based cf
# -*- coding:utf-8 -*-
import pandas as pd
import numpy as np
import math
import json
class ItemCF(object):
def __init__(self, fname):
self.fname = fname
self._read_data1(fname)
def _read_data(self, fname):
self.item_users = {}
with open(fname, "r") as fr:
for line in fr:
fields = line.strip().split(",")
device_uid = fields[1]
resblock_id = fields[2]
cnt = fields[3]
self.item_users.setdefault(resblock_id, {})
self.item_users[resblock_id][device_uid] = cnt
# 对物品编号
all_items = self.item_users.keys()
self.item_size = len(all_items)
self.item_vocab = dict([(item, ind) for ind, item in enumerate(all_items)])
self.item_reverse_vocab = np.array(all_items)
def similarity(self):
self.item_popularity = np.zeros(self.item_size)
# 计算浏览物品的用户数
for item, user_info in self.item_users.items():
_item_index = self.item_vocab[item]
self.item_popularity[_item_index] = len(user_info)
# 计算物品相似度
self.item_similarity = np.zeros((self.item_size, self.item_size))
for i in range(self.item_size-1):
_former_item = self.item_reverse_vocab[i]
_former_item_popularity = self.item_popularity[i]
_former_item_users = self.item_users[_former_item].keys()
for j in range(i+1, self.item_size):
_latter_item = self.item_reverse_vocab[j]
_latter_item_popularity = self.item_popularity[j]
_latter_item_users = self.item_users[_latter_item].keys()
common_size = len(set(_former_item_users) & set(_latter_item_users))
sim = float(common_size) / math.sqrt(_former_item_popularity * _latter_item_popularity)
self.item_similarity[i][j] = sim
self.item_similarity[j][i] = sim
def save_model(self, vocab_reverse_fname, sim_fname):
np.save(vocab_reverse_fname, self.item_reverse_vocab)
np.save(sim_fname, self.item_similarity)
4.user_based vs. item_based
Comparison items | user_based | item_based |
---|---|---|
performance | Suitable for a smaller number of users scenes | It applies to the number of items a user is significantly smaller than the number of scenes |
personalise | Timeliness strong, user personalization less obvious areas of interest | Nagao items rich, strong user demand for personalized areas |
real-time | New user's behavior does not necessarily lead to an immediate change in recommendation results | New user's behavior will lead to the results of real-time changes recommended |
references
Reproduced in: https: //www.jianshu.com/p/9501d377c2a1