On 20 --------- Machine Learning Project Flow

  A complete machine-learning projects include the general process: 
1, abstracted into mathematical problems

  It must first clear the problem, classification or regression, try to avoid indiscriminate attempt;

2, data acquisition and analysis

  The acquired data to be representative, will certainly be over-fitting. 
  And for classification, data skew can not be too serious, not to the number of different categories of data several orders of magnitude difference. But also there is an evaluation, how many samples, the number of features on the order of data, can estimate the extent of the memory consumption, memory training process to determine whether to let go. If you have to consider improved algorithm does not fit or use some of the dimensionality reduction techniques. If the amount of data is too great, it would have to consider a distributed.

3, data preprocessing

  Data cleansing, data normalization, expansion and so on. Normalization, discretization, factorization, missing values, etc. removed collinearity, the data mining process and spent a lot of time on them. The work is simple reproducible, stable and predictable earnings, is the basis for essential step machine learning. 

4, the feature works

  Screening out the salient features, get rid of non-salient features , you need to understand the business machine learning engineer repeatedly. This has a decisive influence on the results of many. Feature selection Well, very simple algorithm can obtain good and stable results. This requires the use of relevant technical analysis features validity, such as the correlation coefficient, chi-square test, the average mutual information, entropy condition, posterior probability, weighted logistic regression methods.

5, selection and training model tuning

  To be selected according to the actual situation of the data and issues specific to solve the model, such as the number of samples, feature dimensions, data characteristics into account; to solve the problem is classification or regression problems to the network to pay attention to what aspects, combined with the actual situation Select Network .

  Tuning problem, cross validation may be used to observe the cause of the loss curve analysis, the test result curves, adjusting parameters: optimizer, learning rate, and the like BatchSize

  Multi-model fusion can try to improve results.

6, after treatment

  To the results of the network is generally not used directly by some post-processing scheme, as added prior constraint, some processing error will significantly removed.

7, the model evaluation

  Various aspects of the assessment, the model accuracy, complexity error, time, space, stability, mobility, etc.

 

Published 121 original articles · won praise 8 · views 30000 +

Guess you like

Origin blog.csdn.net/bylfsj/article/details/104831559