Spark mlib official document learning and translation notes (1)

Machine learning library Mlib

MLib is a spark machine learning library. Its goal is to make machine learning easy to use and scalable. From a high-level perspective, the following tools are provided:

Machine learning algorithms: such as classification, regression, clustering and collaborative filtering

Feature processing: feature extraction, conversion, dimensionality reduction and selection

Pipeline: construction tools, evaluation, and optimization of Pipelines

Persistence: save and load algorithms, models, pipelines

Tool set: linear algebra, statistics, data processing, etc.

After spark2.0, the spark.mllib package based on RDD API has entered the maintenance stage. The APIs in the spark ml package are based on dataframe.

Mlib's RDD-based APIs are expected to be deleted in spark3.0.


Published 30 original articles · praised 74 · 230,000 views +

Guess you like

Origin blog.csdn.net/ruiyiin/article/details/77113289