Machine Learning general process

Machine Learning general process

2019-08-25

1. machine learning classifier

Machine learning is divided into three categories, namely: supervised learning, reinforcement learning, unsupervised learning, one of the major differences are as follows.

1.1 supervised learning

Label training data (ie, the output of the sample) is known by the label training data as feedback on the learning model training method is called supervised learning; common classification and regression subclasses have two;

1.2 Reinforcement Learning

Reinforcement learning training data did not clearly labeled, but it has a feedback signal, similar to the automatic control theory in a closed-loop feedback; this is usually a feedback signal is generated evaluation of the current system by this feedback function of a developer-defined function, further training of the system through a feedback signal;

1.3 Unsupervised system

Data unsupervised training system that is not clearly labeled, there is no feedback function, the function of this system is to extract useful information without a label and feedback situation to explore the overall structure of the data, there is a common subclass clustering and dimension reduction;

 

2. The machine learning system blueprint

 

The entire system can be divided into five machine learning strides: 1) data acquisition, 2) pre-processing the data, 3) model training, 4) model validation, 5) using the model.

2.1 data acquisition

It is important, it is a prerequisite for machine learning, related to the performance of the algorithm training.

2.2 Data Preprocessing

The main purpose is to improve the performance of machine learning algorithms, specifically: 1) reducing the amount of training data in the premise does not affect the accuracy of the algorithm as far as possible, to speed up the algorithm training time, 2) improve the accuracy of the data processing algorithms;

There are many data preprocessing techniques, mainly are the following:

1) data cleaning:

2) filled with data:

3) data format conversion:

4) feature extraction and Zoom: As a result of the different characteristics of different dimensions, leading to a phase difference between the order of the characteristic data may differ greatly, the smaller the numerical value data is submerged large data, such poor performance of the algorithm; feature extraction algorithm is obtained by scaling and mapping data to the different features of [0,1], or to meet the variance 1, 0 mean standard normal distribution, thereby improving the performance of the algorithm;

5) Select feature:

6) dimension reduction: there is a strong coupling between the data portion wherein the selected feature is likely to be possible to reduce the coupling between the data line through dimensionality reduction, reducing the storage data, the training algorithm to accelerate and speed;

7) sampling: In order to ensure consistency of the algorithm, we not only require effective algorithm on the training data, but also good to new data, the test data is essential; through the rational allocation of training data and test data ( cross-validation of data is sometimes necessary) to ensure that the algorithm is also effective when a tool;

8) normalized data and the like.

2.2 model training

Model training is a key step in machine learning, related to the effect of the whole algorithm; model training in technology, there are many common are:

1) model selection (very important): different models for different business scenarios, choose the right model can be more effective;

2) the objective function: a function for evaluating the performance of the algorithm;

3) Optimization: The method to achieve the required objective function optimization algorithm used, i.e., common "gradient descent method";

4) Stop the training conditions were set: to prevent unrestricted operation training process set termination training conditions, the available number of iterations and the objective function is a combination of thresholds to the condition to stop training;

5) cross-validation: one kind of the training set is divided by the "training set" + "validation set" means to reduce over-fitting of the model is simple and effective; tune model parameters over half a superior performance evaluation process, isolation test set to prevent over-fitting.

6) Super optimization parameters: super reference model parameters are not generally used in the training process; such regularization coefficient, learning rate, number of iterations, etc., may be reduced hyperparametric suitable overfitting model, train speed, acceleration ;

7)······

 2.3 Model Validation

Using a test set to evaluate the performance of the model.

There are a lot of model performance indicators, such as commonly used in the classification model error rate, precision, recall, F1 indicators, ROC, and so on.

2.4 model uses

That is trained to use the new model to predict the output data.

 

That is more general process of general machine learning algorithms, specific to each process involves a lot of technology, start to add in a future article in a little bit.

 

Guess you like

Origin www.cnblogs.com/sienbo/p/11408489.html