Spark MLlib Profile

MLlib Spark machine learning library is designed to simplify the work of machine learning engineering practices, and facilitate the expansion to a larger scale.

MLlib by a number of common learning algorithms and tools, including classification, regression, clustering, collaborative filtering, dimension reduction, etc., and also include the optimization of the underlying high-level primitives and pipeline API.

This section provides a brief introduction Spark MLlib, in introducing data mining algorithms, using an algorithm provided by Spark MLlib examples to explain.

Spark MLlib composition

Spark is a memory-based computing, data mining adapted to the natural iterative calculations, but for ordinary developers, implementation of distributed data mining algorithms still have a great challenge. Therefore, Spark provides a huge amount of data based on machine learning library MLlib, it provides a common data mining distributed algorithm to achieve the function.

Developers only need to have Spark foundation and understanding of the principles of data mining algorithms, as well as the meaning of algorithm parameters, can be achieved based on the massive data mining process by calling the appropriate API algorithm.

MLlib consists of four parts: data types, statistical calculations math library, evaluation algorithms and machine learning algorithms.

name Explanation
type of data Vector, the vector band class, matrix, etc.
Mathematics and Statistics, Computing Base Basic statistics, correlation analysis, random number generator, hypothesis testing, etc.
Algorithm Evaluation AUC, accuracy, recall, F-Measure, etc.
Machine learning algorithms Classification algorithms, regression algorithms, clustering, collaborative filtering, etc.

Specifically, the classification algorithms and regression algorithms including logistic regression algorithm, SVM, Naive Bayes, decision trees, and random forests. For k-means clustering algorithm including algorithms and LDA. Collaborative filtering algorithm comprises alternating least squares (ALS) algorithm.

Spark MLlib advantage

Compared to machine learning algorithms (such as HadoopManhout) Hadoop MapReduce-based implementation, Spark MLlib have some unique advantages in machine learning.

First, machine learning algorithms generally have a plurality of steps during the iterative calculation of the composition, machine learning calculations required to obtain a sufficiently small will stop when an error or after a sufficient number of iterations to converge. If you use Hadoop MapReduce framework iterative calculation is to be calculated each time a read / write disk and start the work to complete the task, which will lead to very large I / O and CPU consumption.

The Spark is a computational model for memory-based iterative calculation and design, multiple iterations done directly in memory, will operate only when necessary disk and network, so to say, is the ideal platform Spark MLlib machine learning. Secondly, the Spark excellent and efficient and Akka Netty communication system, the communication efficiency is higher than the calculated frame communication mechanism Hadoop MapReduce.

Shows the performance comparison Logistic Regression algorithm running in Hadoop and Spark Spark official home page, you can see Spark faster than Hadoop more than 100 times.

52. The decision trees and naive Bayes algorithm
53. regression analysis
54. Cluster analysis Introduction
55 .k-means clustering algorithm
56 .DBSCAN clustering algorithm
57 association rules data mining analysis
58. The Apriori algorithm and FP-Tree algorithm
59. based on a large data precision marketing of
60. the personalized recommendation system based on large data
61. big data predictive
62. the other big data applications
63. large data which can be applied in industry
64. the application of big data in the financial sector
65. big data applications in the Internet industry
66. the application of big data in the logistics industry

Guess you like

Origin blog.csdn.net/yuidsd/article/details/92418144
Recommended