My CSDN blog is "Do bionic programmers dream of electronic sheep", this article is written based on markdown, the platform and software are CSDN and Typora, and the storage address of the pictures in this article is CSDN, so some pictures may have "CSDN@维纯计算器会The watermark of "Dream of electric sheep" is my own originality, which is used for the daily work and major work of "Data Mining and Business Intelligence Decision-Making".
The content of this article is the fifteenth chapter, Intelligent Recommendation System - Collaborative Filtering Algorithm. For ease of reading, I have divided the content of the article into the following sections:
basic knowledge
Experimental content
Extended research
experience
Among them, the introduction of each section is as follows:
basic knowledge
Contains personal learning and understanding about the topic of this chapter, summarized knowledge points, and code and operation results worth recording.
Experimental content
This is the subject experiment part of this article, and it is also the experiment content sent by the teacher. After running successfully on the computer (jupyter notebook), it will be exported to markdown format.
Among them, the main title is the subsection content of each chapter
As shown in the figure above, the main title is PCA principal component analysis and code implementation, and the sub-title is the submodule in the file. The content under each main title is different from each other, that is to say, there will be cases where the same python library is referenced under two main titles. To ensure the integrity of the code, it is reserved here.
In order to show that the class work is indeed completed, the code is roughly the same as the code given by the teacher, but the markdown text part has added my own understanding. At the same time, because the data source is not necessarily the same, the running results and drawing are also different from the tutorial, but the experiment itself is correct and complete.
In addition, some relevant cases sent by the teacher (the experiments not in the course center, but the cases sent to the course group, such as the case airline customer value analysis ) will also be attached to this part.
Extended research
This part is the expansion content that I tried outside the experiment of this subject, including code and knowledge points, and also has my own experiment
experience
basic knowledge
Experimental content
15.2 Three Common Methods of Similarity Calculation
15.2.1 Euclidean distance
import pandas as pd
df = pd.DataFrame([[5,1,5],[4,2,2],[4,2,1]], columns=['用户1','用户2','用户3'], index=['物品A','物品B','物品C'])
df
user 1
user 2
user 3
Item A
5
1
5
Item B
4
2
2
Item C
4
2
1
import numpy as np
dist = np.linalg.norm(df.iloc[0]- df.iloc[1])
dist
3.3166247903554
15.2.2 The cosine built-in function
import pandas as pd
df = pd.DataFrame([[5,1,5],[4,2,2],[4,2,1]], columns=['用户1','用户2','用户3'], index=['物品A','物品B','物品C'])
df
user 1
user 2
user 3
Item A
5
1
5
Item B
4
2
2
Item C
4
2
1
from sklearn.metrics.pairwise import cosine_similarity
user_similarity = cosine_similarity(df)
pd.DataFrame(user_similarity, columns=['物品A','物品B','物品C'], index=['物品A','物品B','物品C'])
Item A
Item B
Item C
Item A
1.000000
0.914659
0.825029
Item B
0.914659
1.000000
0.979958
Item C
0.825029
0.979958
1.000000
15.2.3 Simple version of Pearson correlation coefficient
from scipy.stats import pearsonr
X =[1,3,5,7,9]
Y =[9,8,6,4,2]
corr = pearsonr(X, Y)print('相关系数r值为'+str(corr[0])+',显著性水平P值为'+str(corr[1]))
D:\coder\randomnumbers\venv\lib\site-packages\numpy\lib\function_base.py:2845: RuntimeWarning: Degrees of freedom <= 0 for slice
c = cov(x, y, rowvar, dtype=dtype)
D:\coder\randomnumbers\venv\lib\site-packages\numpy\lib\function_base.py:518: RuntimeWarning: Mean of empty slice.
avg = a.mean(axis, **keepdims_kw)
Supplementary knowledge point: use of groupby() function
import pandas as pd
data = pd.DataFrame([['战狼2','丁一',6,8],['攀登者','王二',8,6],['攀登者','张三',10,8],['卧虎藏龙','李四',8,8],['卧虎藏龙','赵五',8,10]], columns=['电影名称','影评师','观前评分','观后评分'])
data
movie title
film critic
Pre-view rating
Post-view rating
0
wolf warrior 2
Ding Yi
6
8
1
climber
Wang Er
8
6
2
climber
Zhang San
10
8
3
Crouching Tiger, Hidden Dragon
Li Si
8
8
4
Crouching Tiger, Hidden Dragon
Zhao Wu
8
10
means = data.groupby('电影名称')[['观后评分']].mean()
means
Post-view rating
movie title
Crouching Tiger, Hidden Dragon
9.0
wolf warrior 2
8.0
climber
7.0
means = data.groupby('电影名称')[['观前评分','观后评分']].mean()
means
Pre-view rating
Post-view rating
movie title
Crouching Tiger, Hidden Dragon
8.0
9.0
wolf warrior 2
6.0
8.0
climber
9.0
7.0
means = data.groupby(['电影名称','影评师'])[['观后评分']].mean()
means