How to build a recommendation engine step by step (below)

Now that we have converted a set of data, we can use it to find similar movies for certain users. There are many differences between pure and mixed similarity indicators.


  • Cosine similarity

  • Euclid destination

  • Jaccard index

  • Pearson related


We can only check some of these indicators.


4.1 Cosine similarity


Cosine similarity, also known as cosine similarity, is to evaluate the similarity of two vectors by calculating the cosine of the angle between them. Cosine similarity draws a vector into a vector space based on coordinate values, such as the most common two-dimensional space.


When we want to find the similarity of two or more elements from the specified data, we must use the cosine similarity. It is very good and often used, often used in the machine learning industry. If you hear this term for the first time, or if you have forgotten what you learned before, it is recommended to watch the cosine similarity video.


The mathematical formula of cosine similarity:


image


This formula can also be translated as: the sum of user A and user B's ratings for movie i, divided by the square of A's rating, and multiplied by the square of B's ​​rating.


4.2 Pearson correlation


Pearson's calculation results are very similar to cosine similarity. We won't discuss it in detail anymore, you can find related term explanations on Wikipedia.


image


Now we have a function to calculate the similarity of user interest through cosine similarity, which is mostly what we need. Based on these data, we can make a recommendation system for recommending movies of interest based on the content that users have watched before.


We calculate the similarity function through the following Python programming.


def cos_similarity(people,movie1,movie2):
    if = {}
    for item in people[movie1]:
        if item in people[movie2]:
            si[item]=1
    if len (si) == 0:
        return 0
    sum1=0
    sum21=0
    sum22=0
    for item in si:
        sum1+=(people[movie1][item]*people[movie2][item])
        sum21+=pow(people[movie1][item],2)
        sum22+=pow(people[movie2][item],2)
    if sum21==0 or sum22==0:
        return 0

    return round(sum1/(sqrt(sum21)*sqrt(sum22)),2)


5.输出


最后,让我们来看输出结果。


第一步,需要有一个已观看电影的数据集合:


movies_watched=["You, Me and Dupree","Catch Me If You Can","Snitch"]

现在系统经过了学习,会为我们推荐喜欢的电影。当前是以前的计算结果 ,也会做出输出。


------------------------------
| You, Me and Dupree          |
-------------------------------
Catch Me If You Can 0.97
Just My Luck 0.85
Lady in the Water 0.96
Snakes on a Plane 0.97
Snitch 1.0
Superman Returns 0.98
The Night Listener 0.96
------------------------------
| Catch Me If You Can        |
------------------------------
Just My Luck 1.0
Lady in the Water 0.98
Snakes on a Plane 0.99
Snitch 1.0
Superman Returns 1.0
The Night Listener 0.92
You, Me and Dupree 0.97
------------------------------
| Snitch                     |
------------------------------
Catch Me If You Can 1.0
Just My Luck 1.0
Lady in the Water 0.91
Snakes on a Plane 0.99
Superman Returns 0.99
The Night Listener 0.88
You, Me and Dupree 1.0
------------------------------


You can see the content suggested by the system. The most similar to "Snitch" are "Catch Me If You Can", "Supermermen Returns", etc., which are equivalent to the similarity measure (the number after the movie title). We want the 3 most similar movies, and we can add a threshold to identify the similarity. For example, we can set the threshold to 0.98, and every movie that exceeds the threshold will appear on our screen.


------------------------------
| You, Me and Dupree          |
-------------------------------
Snitch 1.0
Superman Returns 0.98
------------------------------
| Catch Me If You Can        |
------------------------------
Just My Luck 1.0
Lady in the Water 0.98
Snakes on a Plane 0.99
Snitch 1.0
Superman Returns 1.0
------------------------------
| Snitch                     |
------------------------------
Catch Me If You Can 1.0
Just My Luck 1.0
Snakes on a Plane 0.99
Superman Returns 0.99
You, Me and Dupree 1.0
------------------------------

The complete code introduced to you above is all on Github.

The address is: https://github.com/Mitko06/Recommender-System


in conclusion


Congratulations, we now know the basics of how to build a recommendation system.

Of course, a more complex recommendation system needs to be established in the real world. But 95% of them are based on cosine similarity, Euclidean similarity, Pearson correlation and other indicators.

Building a recommendation system requires time and data accumulation. This tutorial has all explained how to build a recommendation system. You can develop different personalized recommendation engines based on this.


Comment now to share your thoughts, any other questions about this tutorial.


Guess you like

Origin blog.51cto.com/15127566/2666757