Recommendation recall algorithm---collaborative filtering

1. Collaborative filtering-recall based on similar behavior

To achieve collaborative filtering, the following steps are required:

  • Collect user preferences
  • Find similar users or items
  • Calculate and recommend

1.1 Collaborative filtering algorithm

1.1.1 Similarity calculation

After analyzing the user's behavior and obtaining the user's preferences, similar users or similar items can be calculated according to the user's preferences, and then recommendations are made based on the similar users or similar items. That is, collaborative filtering based on users and collaborative filtering of items.

Regarding the calculation of similarity, several existing basic methods are all based on vectors, that is, calculating the distance between two vectors. The closer the distance, the greater the similarity.

  • 1. Co-occurrence similarity The
    definition of the co-occurrence similarity formula of item A and item B:

w A , B = ∣ ( N ( A ) ⋂ N ( B ) ∣ ∣ N ( A ) ∣ w_{A,B} = {|(N(A)\bigcap N(B)|\over |N(A)|} wA,B=N(A)(N(A)N(B)
The denominator $N(A)$ is the number of users who like item A, which can be
understood as what percentage of users who like item A also like item B. This formula will cause any item to be very similar to popular items.

w A , B = ∣ ( N ( A ) ⋂ N ( B ) ∣ ∣ N ( A ) ∣ ∣ N ( B ) ∣ w_{A,B} = {|(N(A)\bigcap N(B)|\over \sqrt {|N(A)||N(B)|}} wA,B=N(A)N(B) (N(A)N(B)
This formula penalizes the weight of item B, thus reducing the likelihood that popular items are similar to many items.

  • 2. Euclidean distance
    d (x, y) = ∑ (xi − yi) 2 d(x,y) = \sqrt {\sum (x_i-y_i)^2}d(x,and )=(xiandi)2
    When the Euclidean distance is used to express the similarity, the following formula is used for conversion: the smaller the distance, the greater the similarity value.

s i m ( x , y ) = 1 1 + d ( x , y ) sim(x,y) = {1 \over 1+d(x,y) } s i m ( x ,and )=1+d(x,and )1

  • 3. Pearson correlation coefficient

The Pearson correlation coefficient is generally used to calculate the closeness between two fixed distance variables, and the value range is [-1,+1].

p ( x , y ) = ∑ x i y i − n x y ‾ ( n − 1 ) s x s y p(x,y) = {\sum x_iy_i- n \overline{xy} \over (n-1)s_xs_y} p(x,and )=(n1)sxsandxiandinxy

  • 4. Cosin similarity

Cosin similarity is widely used to calculate the similarity of document data:
C (x, y) = x ⋅ y ∣ ∣ x ∣ ∣ 2 ∣ ∣ y ∣ ∣ 2 = ∑ xiyi ∑ xi 2 ∑ yi 2 C(x,y ) = {x \cdot y \over ||x||^2||y||^2}={\sum x_iy_i \over \sqrt{\sum x_i^2} \sqrt{\sum y_i^2}}C(x,and )=x2y2xand=xi2 andi2 xiandi

  • 5. Tanimoto coefficient

The Tanimoto coefficient is also known as the Jaccard coefficient, which is an extension of Cosine similarity and is mostly used to calculate the similarity of document data:

T ( x , y ) = x ⋅ y ∣ ∣ x ∣ ∣ 2 + ∣ ∣ y ∣ ∣ 2 − x ⋅ y = ∑ x i y i ∑ x i 2 ∑ y i 2 − ∑ x i y i T(x,y) = {x \cdot y \over ||x||^2 + ||y||^2 - x \cdot y} = {\sum x_iy_i \over \sqrt{\sum x_i^2} \sqrt{\sum y_i^2} - \sum x_iy_i} T(x,and )=x2+y2xandxand=xi2 andi2 xiandixiandi

1.1.2 Recommended calculation

  • UserCF
User/item Item A Item B Item C Item D
User 1 ∘ \ circ ∘ \ circ recommend
User 2 ∘ \ circ
User 3 ∘ \ circ ∘ \ circ ∘ \ circ
  • ItemCF
User/item Item A Item B Item C
User 1 ∘ \ circ ∘ \ circ
User 2 ∘ \ circ ∘ \ circ ∘ \ circ
User 3 ∘ \ circ recommend

1.2 Implementation of collaborative filtering recommendation algorithm

Guess you like

Origin blog.csdn.net/weixin_44127327/article/details/108504183