Homework six logistic regression

  1. Describe in your own words, what is logistic regression, and how is it different from linear regression?

Logistic regression is the probability that the prediction result is bounded between 0 and 1. It can be applied to continuous and categorical independent variables and is easy to use and interpret.

Logistic regression, also known as logistic regression analysis, is a generalized linear regression analysis model commonly used in data mining, automatic disease diagnosis, and economic prediction. For example, discuss the risk factors that cause disease, and predict the probability of disease occurrence based on risk factors. Taking the analysis of gastric cancer as an example, two groups of people are selected, one group is gastric cancer group and the other is non-gastric cancer group. The two groups of people must have different signs and lifestyles.

Linear regression is a statistical analysis method that uses regression analysis in mathematical statistics to determine the interdependent quantitative relationship between two or more variables. It is widely used. Its expression is y = w'x + e, and e is a normal distribution with an error of 0. In regression analysis, only one independent variable and one dependent variable are included, and the relationship between the two can be approximated by a straight line. This regression analysis is called univariate linear regression analysis.

  1. Describe what is overfitting and underfitting?

Overfitting: It is too close to the characteristics of the training data, and it performs very well on the training set. It nearly predicts / differs all the data, but it performs mediocre on the new test set. The output of the training sample It is basically the same as the expected output, but the sample output is very different from the expected output of the test sample .

Underfitting: Insufficient samples or inaccurate algorithms, the characteristics of test samples are not learned, and are not generalized. There is no way to accurately judge after getting new samples. Underfitting is better understood because the model is simple or the corpus is too small There are too many features, the accuracy rate on the training set is not high, and the accuracy rate on the test set is also not high, so that no meaningful parameters can be trained by the training, and the model cannot get good results. Not to mention the advantages and disadvantages, basically choose a reasonable model for under-fitting, reasonable features, and improve the training set.

3. Think about the application scenarios of logistic regression?

The main purpose of Logistic regression:

 

Looking for risk factors: looking for the risk factors of a disease, etc .;

Prediction: According to the model, predict the probability of occurrence of a disease or a situation under different independent variables;

Discrimination: In fact, it is similar to prediction. It is also based on the model to determine the probability that a person belongs to a certain disease or a certain situation, that is, to see how likely this person is to belong to a certain disease.

Logistic regression is mainly used in epidemiology. The more common situation is to explore the risk factors of a disease, predict the probability of a disease based on the risk factors, and so on. For example, if you want to discuss the risk factors of gastric cancer, you can choose two groups of people, one group is gastric cancer group, one group is non-gastric cancer group, and the two groups of people must have different signs and lifestyles. The dependent variable here is whether it is gastric cancer, that is, "yes" or "no", the independent variables can include a lot, such as age, gender, eating habits, Helicobacter pylori infection and so on. The independent variable can be continuous or categorical.

The field of its birth is the medical field, which mainly involves the diagnosis of the disease (that is, whether the patient is sick), and the occurrence probability of the disease. In addition, due to the development of the field of machine learning in recent years, such models have also been used in classification, prediction and other fields.

Guess you like

Origin www.cnblogs.com/xuechendong/p/12760059.html