Neighbors Python-- machine learning collaborative filtering [most humorous, most easy to understand machine learning]


      Machine learning more about, please add attention yo ~ ~. For bloggers please contact Gabor main private letter or contact:
      QQ: 3327908431
      micro letter: ZDSL1542334210

        See here the goods : machine learning, a synonym for the next century, the future of technology's creator, originator of artificial intelligence. Why is it representative of advanced age? Huawei Ren Zhengfei boss once said: "the era of big data is significant," the era of big data artificial intelligence is a machine learning, and it is based on statistics! Hey, I will not tell you I'm statistics, we do not say I'm out statistically professional.
        So what is the collaborative filtering? Summarized in one sentence: "Like attracts like people in groups." So you choose to look at Zhu brother article! After all, Zhu brother know I was three thousand words, five years old learn to recite poems, ten village famous child prodigy! But time is getting na (sob in ...), but unfortunately, ah, I'm 18 years old (crying ...), how to continue ...

1. What is collaborative filtering neighbors

        It is to find the similarity between the goods and then make recommendations based on the similarity. It is divided into item-based collaborative filtering and collaborative filtering based user. Bluntly, is how to get more people to buy more different items, so do these two words ---- are aimed at making money (which is a good thing, who does not like money? If you do not like can give me! Give children in poor mountainous areas of little help now! micro-channel contact details on it, of course, also support QQ).

1.1 article based collaborative filtering

        The simplest explanation is that you watch a movie such as "Wolf 2", and then another and the system, "Wolf 2" similar films such as "Wolf 1", "Red Sea action" to recommend to you at home, which is based on goods collaborative filtering, abbreviated as: a person you want to buy more of the same thing.

1.2 User-based collaborative filtering

        For example, you see the Cubs, bright brother are good friends right, then bear it like watching a movie island "Jamaica down", the results of the system do the movie "Tooth buy down" recommended to the bright brother, this is based on the user's collaborative filtering, abbreviated as: want to recommend something more similar to the character of people (viewers see your character They would like).

2, where similar solution

        It is calculated using the distance, the distance between the objects or between people based on historical data, the closest you can go for a class, similar instructions. Here is the first data into two-dimensional table, based on the user if it is turned on between the two-dimensional table from user to user, it is converted to the article corresponding to the article between the two-dimensional table. If six of six different books scored as follows:

Here Insert Picture Description
        You can see from the table, each user score of each book, so we have to do is to figure the value of filling vacancies in full, and then be sorted in descending according to the recommended fill values. Then these nulls how to fill it? We will think of distance, yes, but we think of the Euclidean distance, which is a direct calculation of the distance. In fact, we can know that each user have more than one-dimensional data, so our relationship is that these users more directions in space, rather than their normal distance, the same direction illustrate two users are similar. So we use the cosine correlation coefficient:
Here Insert Picture Description
        Here we have a powerful Python which is relatively library, so I do not knock the code by hand. After a two-dimensional table obtained by the calculation below.

2.1 collaborative filtering based on user similarity relationship illustrated

        They are similar to the corresponding relationship between the two-dimensional table:
Here Insert Picture Description
        In this way, each user can find exactly how many users with the most similar. The user can buy similar items recommended him to others with similar user.

2.2 illustrates a similar relationship between items based collaborative filtering

        The table also come from the users of the book reviews, because we want the user's perspective to determine their similarity, rather than the similarity of their own, and our goal and vision is always on the user's body.
Here Insert Picture Description
        In this way, you can find each item and what few items most similar in user perspective. We can put similar items recommended to him.

4, the data do collaborative filtering based on the user's neighbor

The data download network disk Baidu ---- extraction connection code: vzqi

#导包
import pandas as pd
import numpy as np
dat='example.txt' # 读入数据  该数据就是 第一张截图里面的表格数据
df = pd.read_csv(dat,header=None)
df.columns=['用户id','物品id','喜好程度'] # 修改列名
#  构建第二张截图的矩阵数据
df_pivot = df.pivot(index="用户id",columns="物品id",values="喜好程度")
freq = df_pivot.fillna(0)  #将缺失值填为0
#在sklearn中有自带的余弦相似度计算函数
from sklearn.metrics.pairwise import cosine_similarity
user_similar = cosine_similarity(freq_matrix)
##画热力图来看一下,比较好看
import matplotlib.pyplot as plt
import seaborn as sns
sns.heatmap(user_similar,annot=True,cbar=False)
plt.show();   #热力图

Heat map
        The lighter the color, the worse it represents like.

If you say that we now want to give the first three users recommend the first five commodity calculate the score

user_id_action = freq_matrix[2,:] #取出第三个用户的评分向量
item_id_action = freq_matrix[:,4] #取出用户对第五个物品的评分向量  
#假如说我们现在想要找出和该用户最相似的三个用户 k=3
#那么应该从这个user_similar 矩阵中提取出三个最大的值,所对应的用户
k = 3
score = 0  #用于计算评分的分子
weight = 0  #用于计算评分的分母   最终得分为 score/weight,因为给用户相似度一个权重
user_id = 2  #第三个用户
item_id = 4  #第五个用户
similar_index =np.argsort(user_similar[user_id])[::-1][1:k+1]  #索引出和第三个用户相似的前三个用户
similar_index    # 结果为 1,0,5 说明1,0,5在第五个物品选择上和第三位用户最相似

User rating to calculate the average
        because the value of the score gap is too large, and the weight of the value of 0-1, good for weight, so the score standardization, the use of standardized data:

#构建一个基于用户和物品的推荐
def Recommendation_mean(user_id,item_id,similar,k=10):
    """减去平均数的计算方法:
    user_id:输入用户ID
    item_id:物品ID
    similar:计算好余弦距离后的矩阵
    k:以计算几个最相近的用户,默认值为10"""
    score = 0
    weight = 0
    user_id_action = freq_matrix[user_id,:]      #用户user_id 对所有商品的行为评分  
    item_id_action = freq_matrix[:,item_id]      #物品item_id 得到的所有用户评分  

    user_id_similar = similar[user_id,:]      #用户user_id 对所有用户的相似度    
    similar_index = np.argsort(user_id_similar)[-(k+1):-1]  #最相似的k个用户的index(除了自己)
    user_id_i_mean = np.sum(user_id_action)/user_id_action[user_id_action!=0].size# 算出平均值
    for j in similar_index :
        if item_id_action[j]!=0: #找到物品评分不为零的值
            user_id_j_action = freq_matrix[j,:]
            user_id_j_mean = np.sum(user_id_j_action)/user_id_j_action[user_id_j_action!=0].size
            score += user_id_similar[j]*(item_id_action[j]-user_id_j_mean)   #计算该物品的评分均值
            weight += abs(user_id_similar[j]) # 得到权重

    if weight==0:  
        return 0
    else:
        return user_id_j_mean + score/float(weight) #计算最终得分

Construction of prediction functions, the score corresponding to each user of each item completed

#构建预测函数
def predict_mean(user_similar):
    """预测函数的功能: 传入相似度矩阵, 通过对每个用户和每个物品进行计算, 计算出一个推荐矩阵"""
    user_count = freq_matrix.shape[0]#用户数
    item_count = freq_matrix.shape[1]#商品数
    predic_matrix = np.zeros((user_count,item_count))
    print(user_count)
    for user_id in range(user_count):
        print(user_id)
        for item_id in range(item_count):
            if freq_matrix[user_id,item_id] == 0:
                #print (user_id,item_id)
                predic_matrix[user_id,item_id] = Recommendation_mean(user_id,item_id,user_similar) #调用函数,求出每一个空值对应的分数,如果数据太大时间会很长。
    return predic_matrix  #返回一个填补完空值的得分表

Get score matrix

user_prediction_matrix = predict_mean(user_similar) #得到每一个用户对应每一个物品的分数
user_prediction_matrix

Here Insert Picture Description
Remove the first few recommended items

def get_topk(group,n):  
    # 返回排序后的前n个值
    return group.sort_values("推荐指数",ascending=False)[:n]
   
def get_recommendation(user_prediction_matrix,n=5):
    # 将用户预测数据, 构建成一个DataFrame
    recommendation_df = pd.DataFrame(user_prediction_matrix,columns=freq.columns,index=freq.index)
    # 将数据进行转换
    recommendation_df = recommendation_df.stack().reset_index() # reset_index重置索引stack将其行索引变成列索引
    # 对列名进行修改
    recommendation_df.rename(columns={0:"推荐指数"},inplace=True)
    # 根据用户ID列进行分组
    grouped = recommendation_df.groupby("用户id")
    # 得到分组后的前几个数据
    topk = grouped.apply(get_topk,n=n)  #的返回值就是func()的返回值
    # 删除掉用户ID列
    topk = topk.drop(["用户id"],axis=1)
    # 删除掉多余的索引
    topk.index = topk.index.droplevel(1)
    # 索引重排
    topk.reset_index(inplace=True)
    return topk

call function

n=5   #取出前5个   要几个就修改这里就可以了
get_recommendatios=get_recommendation(user_prediction_matrix,n)
get_recommendatios #得到每个用户的前5个推荐的书本编号

        Note: If you do an article based on close collaborative filtering, simply df_pivot transpose what you can, as in the code, stop here.

5, the end of the paper eggs - easy moment

        Well we all know, my brother is a good friend Liang Wang Ming out of the quarrel, which is not before the National Day holiday Well, I was wondering nothing to his house looking for him to play, I just two movies, it results of it, his father came out , and to see him play said:. "almost 25 years of age, still nothing, you know to play all day long and see other people's Cai told you to start a company, like most of the" bright brother and I was not happy , and retorted, give me terrified, he said, "what are you as big as it Ma, look at the people ..." the result of his father suddenly silent, staring at him and said: "the reason why Ma powerful, because he has a good son ... there are some good friends ... positive energy. " So really father, like son, he can hang Yeliang noisy, is such a case.

       Today to end here yo // each article has the end the egg - relaxed moment yo ~ plus interest to learn more about machine learning knowledge! Thank you for watching, I was Jetuser-data

Links: [https://blog.csdn.net/L1542334210]
CSND: L1542334210
Here Insert Picture Description
I wish you all success! Family fun!

Published 29 original articles · won praise 53 · views 30000 +

Guess you like

Origin blog.csdn.net/L1542334210/article/details/102468125