Machine Learning (4) - Logistic Regression LR

1. Is the optimal solution equivalent to the original after the model is unevenly scaled in various dimensions? 

  Answer: The equivalence depends on the final error optimization function. If the final optimization function after the change is equivalent, it is equivalent. Knowing this, it is easy to get that if the original feature is multiplied and divided by a constant, it is equivalent. Adding and subtracting and taking logarithms are not equivalent.

2. How do overfitting and underfitting occur and how to solve them? 

  Underfitting: The root cause is that the feature dimension is too small, so that the fitted function cannot meet the training set, and the error is large; 
      Solution: Increase 
  the feature dimension; Overfitting: The root cause is that the feature dimension is too large, resulting in the fitted function Perfectly trained set, but poor prediction results on new data. 
      Solution: (1) reduce the feature dimension; (2) regularize, reduce the parameter value.

  Summary of reducing overfitting: Overfitting is mainly caused by two reasons: too little data + too complex model 
  (1) Get more data: Get more data from the data source; Data Augmentation 
  (2) Use a suitable model: reducing the number of layers and neurons in the network can limit the fitting ability of the network; 
  (3) dropout 
  (3) regularization, which limits the weights from becoming larger during training; 
  (4) limits training time; pass the evaluation test; 
  (4) increase the noise Noise: input + weight (Gaussian initialization) 
  (5) Combine multiple models: Bagging uses different models to fit different parts of the training set; Boosting only uses a simple neural network ;

3. Regarding logistic regression, the benefits of discretization of continuous features 

  In the industry, it is rare to directly feed continuous values ​​as features to logistic regression models. Instead, continuous features are discretized into a series of 0 and 1 features to the logistic regression model. The advantages of doing so are as follows:

  1. The sparse vector inner product multiplication is fast, and the calculation results are easy to store and scalable.

  2. The discretized features are very robust to abnormal data: for example, a feature whose age is >30 is 1, otherwise it is 0. If the features are not discretized, an abnormal data "age 300" can cause a lot of disturbance to the model.

  3. Logistic regression belongs to a generalized linear model, and its expressive ability is limited; after the univariate is discretized into N, each variable has a separate weight, which is equivalent to introducing nonlinearity into the model, which can improve the model's expressive ability and increase the fitting.

  4. After discretization, feature crossover can be performed, changing from M+N variables to M*N variables, further introducing nonlinearity and improving expression ability.

  5. After the features are discretized, the model will be more stable. For example, if the user's age is discretized, 20-30 is used as an interval, and a user will not become a completely different person because the age of a user is one year older. Of course, the samples adjacent to the interval will be just the opposite, so how to divide the interval is a matter of knowledge.

  General understanding: 1) The calculation is simple; 2) The model is simplified; 3) The generalization ability of the model is enhanced, and it is not easily affected by noise.

4. Some problems:  methods     to
   solve overfitting:
  data  expansion, regular term, early termination 
Get the candidate set -> model training, get the result 
   Why does LR need to be normalized or logarithmic? 
    Conforms to assumptions, facilitates analysis, normalization is also beneficial to gradient descent. 
   Why does LR discretize features better? 
    Introduce nonlinearity

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325300140&siteId=291194637