[Introduction to Mahout of Apache]

The Apache Mahout™ project's goal is to build an environment for quickly creating scalable performant machine learning applications.

Mahout is a very powerful data mining tool and a collection of distributed machine learning algorithms, including: the implementation of distributed collaborative filtering called Taste, classification, clustering, etc. The biggest advantage of Mahout is that it is implemented based on hadoop, which converts many algorithms that used to run on a single machine into MapReduce mode, which greatly improves the amount of data and processing performance that the algorithm can process.

 

Apache Mahout software provides three major features:

1)A simple and extensible programming environment and framework for building scalable algorithms

2)A wide variety of premade algorithms for Scala + Apache Spark, H2O, Apache Flink

3)Samsara, a vector math experimentation environment with R-like syntax which works at scale

4)On-GPU compute for performance improvements in large matrix multiplications

 



 I checked the Chinese meaning of Mahout - the person who rides the elephant, and then look at the logo of Mahout. Well, if I want to play happily with the little yellow elephant, I have to play with the elephant driver by the way...

 

Mahout currently provides tools that can be used to build a recommendation engine through the Taste library - a fast and flexible engine for CF. Taste supports user-based and item-based recommendations, and offers many recommendation options, as well as an interface for customization. Taste consists of 5 main components for manipulating users, items and preferences:

DataModel: used to store users, items and preferences

UserSimilarity: Interface for defining similarity between two users

ItemSimilarity: An interface for defining the similarity between two items

Recommender: an interface for providing recommendations

UserNeighborhood: an interface for calculating the proximity of similar users, the results of which can be used by the Recommender at any time

 

 

Machine learning algorithms implemented in Mahout:

Algorithm

Algorithm name

Chinese name

Classification algorithm

Logistic Regression

logistic regression

Bayesian

Bayesian

SVM

Support Vector Machines

Perceptron

Perceptron Algorithm

Neural Network

Neural Networks

Random Forests

random forest

Restricted Boltzmann Machines

Finite Boltzmann Machine

Clustering Algorithm

Canopy Clustering

Canopy Clustering

K-means Clustering

K-means algorithm

Fuzzy K-means

Fuzzy K-Means

Expectation Maximization

EM clustering (expectation maximization clustering)

Mean Shift Clustering

mean-shift clustering

Hierarchical Clustering

Hierarchical clustering

Dirichlet Process Clustering

Dirichlet Process Clustering

Latent Dirichlet Allocation

LDA clustering

Spectral Clustering

spectral clustering

Association Rules Mining

Parallel FP Growth Algorithm

Parallel FP Growth Algorithm

return

Locally Weighted Linear Regression

locally weighted linear regression

Dimensionality reduction/dimension reduction

Singular Value Decomposition

singular value decomposition

Principal Components Analysis

Principal component analysis

Independent Component Analysis

independent component analysis

Gaussian Discriminative Analysis

Gaussian Discriminant Analysis

Evolutionary Algorithms

Parallelized the Watchmaker framework

 

Recommendation/Collaborative Filtering

Non-distributed recommenders

Taste(UserCF, ItemCF, SlopeOne)

Distributed Recommenders

ItemCF

Vector similarity calculation

RowSimilarityJob

Calculate similarity between columns

VectorDistanceJob

Calculate distance between vectors

Non-Map-Reduce Algorithms

Hidden Markov Models

Hidden Markov Model

Collection method extension

Collections

Extends the Collections class of java

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326514245&siteId=291194637