Content-based recommendation algorithm for python machine learning (with source code)

I believe everyone is familiar with the recommendation algorithm. Various apps in daily life will be recommended to you according to your preferences and characteristics. Next, we will introduce the content-based recommendation algorithm in detail.

The content-based model originated in the field of information retrieval. This model is based on the content of the item. The principle of recommendation is to analyze the historical data of the system to extract the content characteristics of the object and the user's interest and preference.

The key link here is to calculate the similarity between the content feature of the recommended object and the interest feature of the user model. Content-based recommendation algorithms do not require a large amount of user data, and are widely used in situations where a large amount of text information is required.

Problem description: You often go to a restaurant to eat Mala Xiangguo. The boss has developed a dish recommendation program. The boss first sorts out the taste of various dishes in the store and records it in the data file. When you order, the program analyzes your history. Review the dishes you like and recommend dishes you may like accordingly

Please like and follow the data set. After the collection, please send a private message to the main blogger.

Problem analysis: The recommendation algorithm uses the taste characteristics of each dish as text type. You can consider building a tifdf matrix of taste features, vectorize the text information, and then use the distance measurement method to calculate the similarity, and then recommend.

Data are as follows

The result is as follows

It can be seen that for the celery with a higher score, the system can recommend dishes with a higher similarity

The source code is as follows

import pandas as pd
from numpy import *
from sklearn.feature_extraction.text import  TfidfVectorizer
food=pd.read_csv(r'hot-spicy pot.csv')
print(food.head())
print(food['taste'].head())
from sklearn.metrics.pairwise import  pairwise_distances
tfidf=TfidfVectorizer(stop_words='english')
tfidf_matrix=tfidf.fit_transform(food['taste'])
print(tfidf_matrix.shape)
cosine_sim=pairwise_distances(tfidf_matrix,metric='cosine')
def content_based_recommendation(name,cosine_sim=cosine_sim):
    idx=indices[name]
    sim_scores=list(enumerate(cosine_sim[idx]))
    sim_scores=sorted(sim_scores,key=lambda x:x[1])
    sim_scores=sim_scores[1:11]
    food_indices=[i[0]for i in sim_scores]
    return food['name'].iloc[food_indices]
indices=pd.Series(food.index,index=food['name']).drop_duplicates()
result=content_based_recommendation("celery")
print("推荐菜品结果如下")
print(result)

Please like and follow the data set. After the collection, please send a private message to the main blogger.

 

 

Guess you like

Origin blog.csdn.net/jiebaoshayebuhui/article/details/126961023