Machine learning algorithms: introduction and application of logistic regression

Introduction to Logistic Regression

Logistic regression (LR) has the word "regression" in it, but logistic regression is actually a classification model and is widely used in various fields. Although deep learning is now more popular than these traditional methods, in fact, these traditional methods are still widely used in various fields due to their unique advantages.

And moreover, the most prominent of which is for two logistic regression model is simple and model interpretability strong .

The advantages and disadvantages of the logistic regression model:

  • Advantages: simple implementation, easy to understand and implement; low computational cost, fast speed, and low storage resources;
  • Disadvantages: easy to under-fit, classification accuracy may not be high


Definition of logistic regression

Simply put, Logistic Regression is a machine learning method used to solve binary classification (0 or 1) problems, which is used to estimate the possibility of something. For example, the possibility of a user buying a certain product, the possibility of a certain patient suffering from a certain disease, and the possibility of a certain advertisement being clicked by the user, etc. Note that "probability" is used here, not mathematical "probability". The result of logisitc regression is not a probability value in the mathematical definition and cannot be used directly as a probability value. This result is often used for weighted summation with other eigenvalues ​​instead of direct multiplication.

So what is the relationship between logistic regression and linear regression?

Logistic Regression and Linear Regression are both a generalized linear model. Logistic regression assumes that the dependent variable y follows a Bernoulli distribution, while linear regression assumes that the dependent variable y follows a Gaussian distribution. Therefore, there are many similarities with linear regression. If the Sigmoid mapping function is removed, the logistic regression algorithm is a linear regression. It can be said that logistic regression is theoretically supported by linear regression, but logistic regression introduces nonlinear factors through the Sigmoid function, so it can easily handle the 0/1 classification problem.


Logistic regression application

Logistic regression models are widely used in various fields, including machine learning, most medical fields and social sciences. For example, the Trauma and Injury Severity Score (TRISS) originally developed by Boyd et al. is widely used to predict the mortality of injured patients, using logistic regression based on observed patient characteristics (age, gender, body mass index, various blood tests) Analyze and predict the risk of specific diseases (such as diabetes, coronary heart disease). Logistic regression models are also used to predict the probability of system or product failure in a given process. It is also used in marketing applications, such as predicting the tendency of customers to purchase products or suspend orders, etc. In economics, it can be used to predict the probability that a person chooses to enter the labor market, while commercial applications can be used to predict the probability that a homeowner will default on a mortgage. The conditional random field is an extension of logistic regression to sequential data and is used for natural language processing.

Logistic regression models are now also the basic components of many classification algorithms, such as credit card transaction anti-fraud based on GBDT algorithm + LR logistic regression in classification tasks, CTR (click through rate) estimation, etc. The advantage is that the output value naturally falls on Between 0 and 1, and has a probability meaning. The model is clear and has a corresponding theoretical basis of probability. The parameters it fits represent the influence of each feature on the result. It is also a good tool for understanding data. But at the same time, because it is essentially a linear classifier, it cannot cope with more complicated data situations. Many times we also use logistic regression models to do some baseline tasks (basic level).


The main use of logistic regression

Look for risk factors

As mentioned above, look for risk factors for a certain disease.


If a logistic regression model has been established, the model can be used to predict the probability of a certain disease or a certain situation under different independent variables.


In fact, it is somewhat similar to prediction. It is also based on the logistic model to determine the probability that a person belongs to a certain disease or a certain situation, that is, to see how likely the person is to belong to a certain disease.

These are the three most commonly used uses of logistic regression. In practice, logistic regression is extremely versatile. Logistic regression has almost become the most commonly used analysis method in epidemiology and medicine, because it has many comparisons with multiple linear regression The advantages of this method will be explained in detail later. In fact, there are many other classification methods, but Logistic regression is the most successful and widely used.

Guess you like