[Introduction to Top 10 Open Source Recommendation Systems]

Recommendation systems have been particularly popular in the past two years. This article collects and organizes some good open source recommendation systems, including lightweight SVDFeature, LibMF, LibFM, etc., which are suitable for research, and heavyweight Mahout, which is suitable for industrial systems. Oryx, EasyRecd, etc. for your reference. PS: The top 10 here only represent personal opinions.

#1.SVDFeature

Homepage: http://svdfeature.apexlab.org/wiki/Main_Page  Language: C++
A feature-based collaborative filtering and sorting tool, developed by Apex Lab of Shanghai Jiaotong University, with high code quality. It won the first place in KDD Cup 2012, the third place in KDD Cup 2011, and the related paper was published in JMLR in 2012, which is enough to show its highness.
SVDFeature includes a very flexible Matrix Factorization recommendation framework, which can easily implement SVD, SVD++ and other methods, and is the most accurate one among single-model recommendation algorithms. The SVDFeature code is refined, and a large-scale stand-alone matrix factorization operation can be implemented with relatively less memory. In addition, the model containing Logistic regression can be easily used for ensemble.

#2.LibMF

Homepage: http://www.csie.ntu.edu.tw/~cjlin/libmf/Language  : C++
Author Chih-Jen Lin is from the famous National Taiwan University. They are well-known in the field of machine learning and have held several consecutive KDD Cups in recent years. They have achieved excellent results in the competition, and have won the championship for many consecutive years. The style of National Taiwan University is very pragmatic. LibSVM, Liblinear, etc. commonly used in the industry are developed by them. The efficiency and quality of open source code are very high.
LibMF has made a good contribution to the parallelization of matrix decomposition. Aiming at the locking problem and memory discontinuity of the SGD (Stochastic Gradient Descent) optimization method in parallel computing, an efficient matrix decomposition algorithm FPSGD (Fast Parallel) is proposed. SGD), divide the scoring matrix block according to the number of computing nodes, and assign computing nodes. The system introduction can be found in this  paper (Best paper Award of ACM Recsys 2013).

# 3. LibFM

Homepage: http://www.libfm.org/Language  : C++
The author is Steffen Rendle from Konstanz University in Germany. He used LibFM to play the two sub-competition units of Track1 and Track2 of KDD Cup 2012 at the same time, and achieved good results. LibFM is a very useful tool.
LibFM is a powerful tool specially used for matrix decomposition, especially the MCMC (Markov Chain Monte Carlo) optimization algorithm is implemented, which is more accurate than the common SGD optimization method, but the operation speed is slower. Of course, LibFM also implements algorithms such as SGD, SGDA (Adaptive SGD), and ALS (Alternating Least Squares).

#4.Lenskit

Homepage  : http://lenskit.grouplens.org/languageJava

This Java-developed open source recommendation system comes from the GroupLens team at the University of Minnesota in the United States, and is also the author of Movielens, a well-known test dataset in the recommendation field.
The source code is hosted on GitHub, https://github.com/grouplens/lenskit . Mainly include lenskit-api, lenskit-core, lenskit-knn, lenskit-svd, lenskit-slopone, lenskit-parent, lenskit-data-structures, lenskit-eval, lenskit-test and other modules, mainly implement k-NN, SVD , Slope-One and other typical recommender system algorithms.

#5.GraphLab

主页:GraphLab - Collaborative Filtering 语言:C++
Graphlab 是基于C++开发的一个高性能分布式graph处理挖掘系统,特点是对迭代的并行计算处理能力强(这方面是hadoop的弱项),由于功能独 到,GraphLab在业界名声很响。 用GraphLab来进行大数据量的random  walk或graph-based的推荐算法非常有效。Graphlab虽然名气比较响亮(CMU开发),但是对一般数据量的应用来说可能还用不上。
GraphLab 主要实现了ALS,CCD++,SGD,Bias-SGD,SVD++,Weighted-ALS,Sparse-ALS,Non-negative  Matrix Factorization,Restarted Lanczos Algorithm等算法。

#6.Mahout

主页:http://mahout.apache.org/ 语言:Java
Mahout  是 Apache Software Foundation (ASF)  开发的一个全新的开源项目,其主要目标是创建一些可伸缩的机器学习算法,供开发人员在 Apache 在许可下免费 使用。Mahout项目是由  Apache Lucene社区中对机器学习感兴趣的一些成员发起的,他们希望建立一个可靠、文档翔实、可伸缩的项目,在其中实现一些常见的用于  聚类和分类的机器学习算法。该社区最初基于 Ngetal. 的文章 “Map-Reduce for Machine Learning on  Multicore”,但此后在发展中又并入了更多广泛的机器学习 方法,包括Collaborative  Filtering(CF),Dimensionality Reduction,Topic Models等。此外,通过使用 Apache  Hadoop 库,Mahout 可以有效地扩展到云中。
在Mahout的Recommendation类算法中,主要有User-Based CF,Item-Based CF,ALS,ALS on Implicit Feedback,Weighted MF,SVD++,Parallel SGD等。

#7.Myrrix

主页:http://myrrix.com/ 语言:Java
Myrrix 最初是Mahout的作者之一Sean  Owen基于Mahout开发的一个试验性质的推荐系统。目前Myrrix已经是一个完整的、实时的、可扩展的集群和推荐系统,主要  架构分为两部分:服务层:在线服务,响应请求、数据读入、提供实时推荐;计算层:用于分布式离线计算,在后台使用分布式机器学习算法为服务层更新机器学习   模型。Myrrix使用这两个层构建了一个完整的推荐系统,服务层是一个HTTP服务器,能够接收更新,并在毫秒级别内计算出更新结果。服务层可以单独使 用,无需 计算层,它会在本地运行机器学习算法。计算层也可以单独使用,其本质是一系列的Hadoop jobs。目前Myrrix以被  Cloudera 并入Oryx项目。

#8.EasyRec

主页:http://easyrec.org/ 语言:Java
EasyRec 是一个易集成、易扩展、功能强大且具有可视化管理的推荐系统,更像一个完整的推荐产品,包括了数据录入模块、管理模块、推荐挖掘、离线分析等。  EasyRec可以同时给多个不同的网站提供推荐服务,通过tenant来区分不同的网站。架设EasyRec服务器,为网站申请tenant,通过 tenant就可以很方便的集成到  网站中。通过各种不同的数据收集(view,buy.rating)API收集到网站的用户行为,EasyRec通过离线分析,就可以产生推荐信息,您的 网站就可以通过 Recommendations和Community Rankings来进行推荐业务的实现。

#9.Waffles

主页:http://waffles.sourceforge.net/ 语言:C++
Waffles 英文原意是蜂蜜甜饼,在这里却指代一个非常强大的机器学习的开源工具包。Waffles里包含的算法特别多,涉及机器学习的方方面面,推荐系统位于  其中的Waffles_recommend  tool,大概只占整个Waffles的1/10的内容,其它还有分类、聚类、采样、降维、数据可视化、音频处理等许许多多工具包,估计  能与之媲美的也就数Weka了。

#10.RapidMiner

主页:http://rapidminer.com/ 语言:Java
RapidMiner(前 身是Yale)是一个比较成熟的数据挖掘解决方案,包括常见的机器学习、NLP、推荐、预测等方法(推荐只占其中很小一部分),而且带有GUI的  数据分析环境,数据ETL、预处理、可视化、评估、部署等整套系统都有。另外RapidMiner提供commercial  license,提供R语言接口,感觉在向着一个商用的 数据挖掘公司的方向在前进。
======================================分割线======================================

开 源的推荐系统大大小小的还有很多,以上只是介绍了一些在学术界和工业界比较流行的TOP  10,而且基本上都是用C++/Java实现的,在参考资料[1]、[2]中还提  到的有Crab(Python)、CofiRank(C++)、MyMediaLite(.NET/C#)、PREA(Java)、Python- recsys(Python)、Recommendable(Ruby)、Recommenderlab(R)、  Oryx(Java)、recommendify(Ruby)、RecDB(SQL)等等,当然GitHub上还有更多。。。即有适合单机运行的,也有适 合集群的。虽然使用的编程语言不同,但实现 的算法都大同小异,主要是SVD、SGD、ALS、MF、CF及其改进算法等。

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326237403&siteId=291194637