logistic regression

Logistic regression, also known as logistic regression analysis , is a generalized linear regression analysis model, which is often used in data mining, automatic disease diagnosis, economic forecasting and other fields. For example, discuss the risk factors that cause diseases, and predict the probability of disease occurrence based on risk factors. Taking the analysis of gastric cancer as an example, two groups of people were selected, one was gastric cancer group and the other was non-gastric cancer group. The two groups must have different signs and lifestyles. Therefore , the dependent variable is whether it is gastric cancer, the value is "yes" or "no", and the independent variables can include many, such as age, gender, eating habits, Helicobacter pylori infection and so on. Independent variables can be either continuous or categorical. Then, through logistic regression analysis, the weights of independent variables can be obtained, so that we can roughly understand which factors are risk factors for gastric cancer. At the same time, according to the weight, the possibility of a person suffering from cancer can be predicted according to the risk factors.

Logistic regression is a generalized linear model, so it has a lot in common with multiple linear regression analysis. Their model forms are basically the same, both have w'x+b, where w and b are parameters to be determined, the difference is that their dependent variables are different, and multiple linear regression directly uses w'x+b as the dependent variable, that is, y =w'x+b, and logistic regression uses the function L to correspond w'x+b to a hidden state p, p =L(w'x+b), and then determine the dependent variable according to the size of p and 1-p value. If L is a logistic function, it is logistic regression, and if L is a polynomial function, it is polynomial regression.  [1] 
The dependent variable of logistic regression can be binary or multi-category, but binary classification is more commonly used and easier to explain, and multi-category can be processed using the softmax method. The most commonly used in practice is the binary logistic regression.  [1]  
The essence of logistic regression: divide the probability of occurrence by the probability of no occurrence and then take the logarithm. It is this less cumbersome transformation that changes the contradiction between the value intervals and the curve relationship between the dependent and independent variables. The reason is that the probability of occurrence and non-occurrence becomes a ratio, and this ratio is a buffer, expanding the range of values, and then performing logarithmic transformation, the entire dependent variable changes. Not only that, but this transformation often results in a linear relationship between the dependent and independent variables, which is summed up according to a lot of practice. Therefore, logistic regression fundamentally solves the problem of what if the dependent variable is not a continuous variable. Also, Logistic is widely used because many real-world problems fit its model. For example, whether an event occurs in relation to other numerical independent variables.  [2]  
Note: If the argument is of character type, it needs to be re-encoded. Generally, if the independent variable has three levels, it is very difficult to deal with

Guess you like