Mac OS/X 下安装 Mahout

一、Mahout介绍
Mahout 是 Apache Software Foundation（ASF）旗下的一个开源项目，提供一些可扩展的机器学习领域经典算法的实现，可以快速开发出可伸缩的性能机器学习应用程序。Mahout包含许多实现，包括聚类、分类、推荐过滤、频繁子项挖掘。此外，通过使用 Apache Hadoop 库，Mahout 可以有效地扩展到云中。
Apache Mahout 软件具有三个典型的特征：
1. 一个简单而可扩展的编程环境和构建可伸缩算法的框架。
2. 为Scala + Apache Spark、H2O、Apache Flink提供了各种各样的预生成算法。
3. Samsara，一个类似于像R语法的向量数学实验环境
Mahout 的创始人是Grant Ingersoll。
二、安装环境
当然要保证 Hadoop 已经正确安装。Mahout 不是一个分布式的环境，只是利用了 hadoop 的mapreduce 的计算。当前的节点。
当前运行的 hadoop 节点
1. 使用 brew 下载
brew install mahout
自动下载到了/usr/local/opt/mahou 目录下。
2. 设置环境变量
这个软件比较简单，在~/.bash_profile 中添加如下变量
###setup Mahout export MAHOUT_HOME=/usr/local/opt/mahout/libexec MAHOUT_CONF_DIR=$MAHOUT_HOME/ export PATH=$MAHOUT_HOME/bin:$PATH
然后， source ~/.bash_profile然后，就可以直接运行 mahout 了。
运行结果如下：
No MAHOUT_CONF_DIR found MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath. Running on hadoop, using /Users/wangxinnian/hadoop/bin/hadoop and HADOOP_CONF_DIR=/Users/wangxinnian/hadoop/etc/hadoop MAHOUT-JOB: /usr/local/opt/mahout/libexec/mahout-examples-0.13.0-job.jar An example program must be given as the first argument. Valid program names are: arff.vector: : Generate Vectors from an ARFF file or directory baumwelch: : Baum-Welch algorithm for unsupervised HMM training canopy: : Canopy clustering cat: : Print a file or resource as the logistic regression models would see it cleansvd: : Cleanup and verification of SVD output clusterdump: : Dump cluster output to text clusterpp: : Groups Clustering Output In Clusters cmdump: : Dump confusion matrix in HTML or text formats cvb: : LDA via Collapsed Variation Bayes (0th deriv. approx) cvb0_local: : LDA via Collapsed Variation Bayes, in memory locally. describe: : Describe the fields and target variable in a data set evaluateFactorization: : compute RMSE and MAE of a rating matrix factorization against probes fkmeans: : Fuzzy K-means clustering hmmpredict: : Generate random sequence of observations by given HMM itemsimilarity: : Compute the item-item-similarities for item-based collaborative filtering kmeans: : K-means clustering
三、运行案例
在 hadoop 的 hfs 下有目录/user/wangxinnian/，wangxinnian 就是你当前运行 hadoop 的 home 目录，我们需要在下面建立一个 testdata 目录
$ hadoop fs -mkdir /user/wangxinnian/testdata
然后，我们下载一个数据：
http://archive.ics.uci.edu/ml/databases/synthetic_control/synthetic_control.data
并将数据 synthetic_control.data 上传到/user/wangxinnian/testdata 目录下
hadoop fs -put synthetic_control.data /user/wangxinnian/testdata
该命令是在 synthetic_control.data 目录下，在 Mac OS中，是在：/Users/wangxinnian/Downloads，就是你当前home 目录下的 Downloads 下。
然后开始运行：
mahout -core org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
运行时间取决你的计算机的速度快慢。
最后输出的结果在：
这里写图片描述
在 hadoop 中运行的结果如下图所示。

caridle

发布了52 篇原创文章 · 获赞 4 · 访问量 5万+

私信关注

Mac OS/X 下安装 Mahout

猜你喜欢