Machine Learning Series 4: model type

Model Type

Parameter model

Statistics, the overall parameters of the model is usually assumed to obey a certain distribution, which is determined by a number of parameters (determined by the profile is too mean and variance), based on model parameters of the model structure is referred to as

include

  • Logistic regression
  • Linear component analysis
  • Perceptron

advantage

  • Clean room: the theory is easy to understand and interpret the results
  • Fast: speed parameters of the model are quickly learning and training
  • Less data: usually do not require large amounts of data, while not a good fit to the data also performed well

Limit

  • Constraints: The way the selected function in the form of learning itself limits the model
  • Limited complexity: usually only respond to simple questions
  • Fit small: in practice often not consistent with the objective function and potential

Non-parametric model

Distribution of the population makes no assumptions, just know that in general is a random variable, distribution exists (there may be distribution parameters), but can not know its distribution form, but do not know the parameters of the distribution, only to Some conditions at a given sample can be estimated according to the method of non-parametric statistics.

Objective function of the form does not make too many assumptions of non-parametric algorithms called machine learning algorithm, we do not make assumptions, algorithms are free of any form of learning function from training data.

Non-parametric theory in the process of seeking objective function of the training data for the best fit, while maintaining the ability of some generalization to unknown data. Similarly, they can fit each function of the form.

K-nearest neighbor algorithm: The objective is based on the k most similar model to predict the new data, the theory for the objective function of the form, in addition to the number of similar models do not make any assumptions

include

  • Tree, CART, C4.5
  • Naive Bayes
  • Support vector machine SVM
  • Neural Networks

advantage

  • Variability: you can fit many different functional forms
  • Powerful model: For the objective function does not make assumptions or make assumptions tiny
  • Good performance: For the prediction performance can be very good

limitation

  • More data are needed: Fit for the objective function need more training data
  • Slow: because of the need for more training parameters, the training process is usually slow
  • Overfitting: There is a higher risk of over-fitting, for prediction is also more difficult to explain

Distance model

include

  • Linear Regression
  • SVM
  • Logistics regression
  • knn
  • k-means

Pretreatment

  • Property for a long time, the best first dimension reduction, so as not to drown meaningless data meaningful data
  • Before using histogram analysis do see a dense area of ​​the sample
  • Standardization needs to be done for each property before use, in order to avoid a large property values ​​have more weight
  • Before use different weights assigned to each attribute based on experience
  • For not directly separate data, consider using nuclear conversion function then calculates the distance

l61GkQ.png

Guess you like

Origin www.cnblogs.com/monkeyT/p/12160707.html