1. Collaborative filtering-recall based on similar behavior

To achieve collaborative filtering, the following steps are required:

Collect user preferences
Find similar users or items
Calculate and recommend

1.1 Collaborative filtering algorithm

1.1.1 Similarity calculation

After analyzing the user's behavior and obtaining the user's preferences, similar users or similar items can be calculated according to the user's preferences, and then recommendations are made based on the similar users or similar items. That is, collaborative filtering based on users and collaborative filtering of items.

Regarding the calculation of similarity, several existing basic methods are all based on vectors, that is, calculating the distance between two vectors. The closer the distance, the greater the similarity.

1. Co-occurrence similarity The
definition of the co-occurrence similarity formula of item A and item B:

$w_{A,B} = {|(N(A)\bigcap N(B)|\over |N(A)|}$
The denominator $N(A)$ is the number of users who like item A, which can be
understood as what percentage of users who like item A also like item B. This formula will cause any item to be very similar to popular items.

$w_{A,B} = {|(N(A)\bigcap N(B)|\over \sqrt {|N(A)||N(B)|}}$
This formula penalizes the weight of item B, thus reducing the likelihood that popular items are similar to many items.

2. Euclidean distance
$\sqrt {\sum (x_i-y_i)^2}$
When the Euclidean distance is used to express the similarity, the following formula is used for conversion: the smaller the distance, the greater the similarity value.

$\over 1+d(x,y) }$

3. Pearson correlation coefficient

The Pearson correlation coefficient is generally used to calculate the closeness between two fixed distance variables, and the value range is [-1,+1].

${\sum x_iy_i- n \overline{xy} \over (n-1)s_xs_y}$

4. Cosin similarity

Cosin similarity is widely used to calculate the similarity of document data:
$\cdot y \over ||x||^2||y||^2}={\sum x_iy_i \over \sqrt{\sum x_i^2} \sqrt{\sum y_i^2}}$

5. Tanimoto coefficient

The Tanimoto coefficient is also known as the Jaccard coefficient, which is an extension of Cosine similarity and is mostly used to calculate the similarity of document data:

$\cdot y \over ||x||^2 + ||y||^2 - x \cdot y} = {\sum x_iy_i \over \sqrt{\sum x_i^2} \sqrt{\sum y_i^2} - \sum x_iy_i}$

1.1.2 Recommended calculation

UserCF

User/item	Item A	Item B	Item C	Item D
User 1	$\ circ$		$\ circ$	recommend
User 2		$\ circ$
User 3	$\ circ$		$\ circ$	$\ circ$

ItemCF

User/item	Item A	Item B	Item C
User 1	$\ circ$		$\ circ$
User 2	$\ circ$	$\ circ$	$\ circ$
User 3	$\ circ$		recommend

Recommendation recall algorithm---collaborative filtering

Recommendation system recall algorithm

1. Collaborative filtering-recall based on similar behavior

1.1 Collaborative filtering algorithm

1.1.1 Similarity calculation

1.1.2 Recommended calculation

1.2 Implementation of collaborative filtering recommendation algorithm

Guess you like