[User-based] collaborative filtering recommendation algorithm (implementation of UserCF algorithm)

Collaborative filtering algorithms are widely used in the field of recommendation algorithms, mainly based on user (UserCF) and item-based (ItemCF) two different types:

  • User-based recommendation algorithm: It is an algorithm to find users with similar interests. If you are building a learning resource sharing platform, your user group has a generally stable professional and relatively fixed reading and learning hobbies. When user A (that is, the logged-in user) needs personalized recommendations, he can first find a user group G that is similar to his interests, and then predict and evaluate what is included in G but not in A, and finally based on the predicted evaluation User A makes a recommendation.
  • Item-based recommendation algorithm: When a user needs personalized recommendation, for example, because he bought flowers before, he will recommend flower pots to him, because many other users buy flowers and flower pots at the same time. Item-based recommendation algorithms need to first calculate the similarity between items, and then generate a recommendation list for the user based on the item's similarity and the user's historical behavior.

The author of this article will introduce the implementation of the first algorithm, the user-based collaborative filtering algorithm (UserCF). As mentioned above, we divide the steps into three steps:

1. Find a user group with similar interests to user A

  •   First, we need to calculate the similarity between users. In this paper, we use the cosine similarity to calculate, the formula is

                                                           

       (Where N (u) represents the set of items purchased by user u, | N (u) | is the number of items purchased by him, and N (v) is the same as the object of user v) 

        Calculate the denominator part: we construct a similarity matrix sparseMatrix, sparseMatrix [u] [v] represents the number of identical items purchased by user u and user v

  •   Then we sort the user similarity in descending order, and only take the most similar user groups within the threshold 

2. Predict and analyze those included in G that A has not heard or seen.

        From the group of similar users, the items purchased by similar users that I (logged in users) do not have are obtained as items with predictions. Because users who share my interests bought them, I might like them too. This is the principle of UserCF. For these items to be predicted, we need to perform prediction analysis, the formula is:

                                                          

Where u is the user, i is the item, (u, k) is the similar group of the user u, p (u, i) is the user's favorability of the item i, r is the rating, in this example, take 1, in In a scoring system, it plays a role in adjusting the weights, making the results more accurate.

3. Recommend A based on the predicted analysis value

     According to the user's analysis of the favorability of the predicted items, a sorted list can be generated after sorting. In this example, it is more concise and only recommends the most suitable one.

code show as below:

#include<cstdio>
#include<cstring>
#include<cmath>
#include<iostream>
#include<algorithm>
#include<set>
using namespace std;
const int maxn = 100;
struct similarty{
	double weight;
	int user;
}similartys[maxn];
void myprint(int userItem[][maxn],int n){
	for(int i=0;i<n;i++){
		cout<<"用户"<<userItem[i][0]<<"买了:";
		int j=1; 
		while(userItem[i][j]!='\0'){
			cout<<userItem[i][j]<<" ";
			j++;
		}
		cout<<endl;
	}
	cout<<"-----------"<<endl;
}
int length(int *array){
	int num=0;
	int p=1;
	while(array[p]!='\0'){
		num++;
		p++; 
	} 
	return num;
}
bool myfind(int userItem[][maxn],int user,int n){
	for(int i=1;i<=length(userItem[user-1]);i++){
		if(userItem[user-1][i]==n) return true;
	}
	return false;
}
bool cmp(similarty a,similarty b){
	return a.weight>b.weight;
}
int main(){
	int n=5;  //用户人数 
	int userId=3;  //推荐给userId 
	int num=2; //匹配用户个数 
	cout<<"用户人数:"<<n<<"  "<<"登录id:"<<userId<<endl<<"具体情况:"<<endl; 
	//数据集
	int userItem[n][maxn]={
		{1,1,2,3,6,7},
		{2,2,3},
		{3,1,2,4,5,7},
		{4,1,2,4,6},
		{5,3,4}
	};
	myprint(userItem,n);
	int sparseMatrix[n+1][n+1];  //记录两个用户之间的相似度的稀疏矩阵 
	memset(sparseMatrix,0,sizeof(sparseMatrix));
	//计算用户之间的相似相似度矩阵 
	for(int i=0;i<n;i++){
		for(int j=i+1;j<n;j++){
			int p=1;
			while(userItem[i][p]!='\0'){
				int q=1;
				while(userItem[j][q]!='\0'){
					if(userItem[i][p] == userItem[j][q]){
						sparseMatrix[userItem[j][0]][userItem[i][0]]++;
						sparseMatrix[userItem[i][0]][userItem[j][0]]++;
						break;
					}
					q++;
				}
				p++;
			}
			
		} 
	} 
	//计算用户之间的相似度 (余弦相似性)
	int userIdLength = length(userItem[userId-1]);
	int cnt=0;
	for(int i=1;i<=n;i++){
		if(i!=userId){
			int iLength = length(userItem[i-1]);
			similartys[cnt].weight = sparseMatrix[userId][i]/sqrt(userIdLength*iLength);
			similartys[cnt].user = i;
			cout<<"用户"<<userId<<"和"<<i<<"相似度:"<<similartys[cnt].weight<<endl;
			cnt++;
		}else{
			similartys[cnt++].weight=-1;
		}
	}
	sort(similartys,similartys+n,cmp);
	cout<<"-----------"<<endl<<"用户匹配度最高的[2]位用户为:"; 
	for(int i=0;i<2;i++){
		cout<<similartys[i].user<<" ";
	}
	set<int> items;   //记录考虑的商品:相似用户有的,而登录用户没有的 
	for(int i=0;i<2;i++){
		for(int j=1;j<=length(userItem[similartys[i].user-1]);j++){
			if(!myfind(userItem,userId,userItem[similartys[i].user-1][j])){
				items.insert(userItem[similartys[i].user-1][j]);
			}
		}		
	} 
	cout<<endl<<"-----------"<<endl<<"考虑商品:"<<endl;
	set<int>::iterator it = items.begin();
	for(it;it!=items.end();it++){
		cout<<*it<<" ";
	}
	cout<<endl<<"它们的匹配度为:"<<endl;
	//计算商品匹配度 
	double maxpick = 0; 
	int recommendation;
	for(it=items.begin();it!=items.end();it++){
		double pick = 0;
		for(int i=0;i<2;i++){
			if(myfind(userItem,similartys[i].user,*it)){
				pick+=similartys[i].weight;
			}
		}
		if(pick>maxpick){
			maxpick = pick;
			recommendation = *it;
		}
		cout<<"商品"<<*it<<":"<<pick<<endl;
	}	
	cout<<"-----------"<<endl<<"推荐商品:"<<recommendation<<endl;
}

running result: 

Summary: The principle of the CF algorithm is actually relatively simple, and we need to master the calculation formulas in several steps to achieve basic functions.

Published 20 original articles · won 15 · views 216

Guess you like

Origin blog.csdn.net/qq_37414463/article/details/105410992