BR-MLP spark + Hadoop-based solutions for distributed data mining function analysis

BR-MLP is based on the large data BR-ODP distributed data mining platform , based on Hadoop and Spark technology to support massive data mining. Provide data sources, data preprocessing, feature engineering, statistical analysis, machine learning ...... components.

Data Mining Platform

1. The data source
provides data sets stored programs and data loaded into the function platform program,

2. Data preprocessing
on data cleaning, the type of conversion, the value of the filling, so that the data content and more regular structure, for subsequent assembly process, comprising: removing repeat, random sampling, stratified sampling ......

3 wherein Engineering
of the pretreated deeper structured data processing, the main scaling, smooth anomalies, feature extraction and dimension reduction and the like.

Discrete feature, feature extraction is the significant feature ......

4 Statistical analysis
of statistical data analysis, to understand the overall data or details, distribution, correlation and goodness of fit tests, so that when we do data preprocessing and feature projects, some idea of what factors affect our final result relatively large and so on.

5 Classification and Regression
Model Construction of classification or regression models, created applied subsequent service data (application data) prediction / classification, regression. BR-MLP including decision tree classification, decision trees return, naive Bayes, and other random forest classifier ...... 12 algorithms.

6 Clustering
provides clustering unsupervised machine learning methods, including text topic clustering, etc., can be used alone, automatic classification, and classification algorithms can be used in conjunction with the first category to get the cluster, and then get classified as category label model, build a classification model.

7 collaborative filtering
BR-MLP support collaborative filtering, can be used to distinguish between what may be of interest to a particular customer, these findings come from an analysis of what other similar customer interest in the product. Collaborative filtering its outstanding speed and robustness, in hot areas of the global Internet.

8 association analysis
used to analyze the association between things, including the correlation between the correlation between people, things and objects, is the most classic case of diapers and beer, very common in the shopping basket analysis.

9 deep learning
representation attribute categories or more abstract features formed by the combination of high-level low-level features, to find a distributed representation of the characteristic data.

10 model application
algorithm model selection has been built, the model will be applied to the selected operational level.

11 visual
classification / regression, clustering applications graphically show the results of the model.

Reproduced in: https: //blog.51cto.com/14191705/2410850

Guess you like

Origin blog.csdn.net/weixin_33978016/article/details/93033240