Machine learning library Mlib
MLib is a spark machine learning library. Its goal is to make machine learning easy to use and scalable. From a high-level perspective, the following tools are provided:
Machine learning algorithms: such as classification, regression, clustering and collaborative filtering
Feature processing: feature extraction, conversion, dimensionality reduction and selection
Pipeline: construction tools, evaluation, and optimization of Pipelines
Persistence: save and load algorithms, models, pipelines
Tool set: linear algebra, statistics, data processing, etc.
After spark2.0, the spark.mllib package based on RDD API has entered the maintenance stage. The APIs in the spark ml package are based on dataframe.
Mlib's RDD-based APIs are expected to be deleted in spark3.0.