[GreedyAI Assignment 2] Lessons from the second week

This week's course mainly learns linear/logistic regression and naive Bayes.

The course explanation is very clear, and the harvest is great. It is not enough to digest and consolidate. You should also learn and output the teacher's code again, cut it all!


Linear regression

Linear regression is a statistical analysis method that uses regression analysis in mathematical statistics to determine the quantitative relationship between two or more variables. It is widely used. The expression is y = w'x+e, where e is a normal distribution whose error follows a mean value of 0.

In regression analysis, only one independent variable and one dependent variable are included, and the relationship between the two can be approximated by a straight line. This kind of regression analysis is called unary linear regression analysis. If the regression analysis includes two or more independent variables, and the relationship between the dependent variable and the independent variable is linear, it is called multiple linear regression analysis.

Linear regression is a method of modeling the relationship between one or more independent variables and dependent variables using the least squares function of the linear regression equation.

Linear regression is like the classic representative of less is more. Although it is only a line, it is the most widely used model.

Source: Greedy Academy-Unary linear regression model

Logistic regression

Although logistic regression has the word "regression", logistic regression is a classification algorithm. Logistic regression can perform multiple classification operations, but the nature of the logistic regression algorithm determines that it is more commonly used for two classifications. Linear regression completes the regression fitting task, and for the classification task, we also need a line, but instead of fitting each data point, it distinguishes samples of different categories.

Applicable conditions of logistic regression model

1 The dependent variable is a binary categorical variable or the incidence of an event, and it is a numeric variable. However, it should be noted that the indicator of repeated counting phenomenon is not applicable to Logistic regression.

2 Both the residual and the dependent variable must obey the binomial distribution. The binomial distribution corresponds to categorical variables, so it is not a normal distribution, and instead of using the least square method, but the maximum likelihood method to solve the equation estimation and testing problems.

3 Independent variables and Logistic probability are linear relationships

4 The observation objects are independent of each other

 


Naive Bayes

Naive Bayes method is a classification method based on Bayes' theorem and the assumption of independence of characteristic conditions, and is a classic statistical method.

  • Conditional probability : The core part of Naive Bayes is Bayes' rule, and the cornerstone of Bayes' rule is conditional probability. Bayes' rule is as follows:

By vectorizing text data and calculating various probabilities, the classification function can be realized through this model.

Naive Bayesian algorithm plays an important role in text recognition and image recognition direction. An unknown text or image can be classified according to its existing classification rules, and finally achieve the purpose of classification. In real life, the Naive Bayes algorithm is widely used, such as text classification, spam classification, credit evaluation, phishing website detection and so on.

Guess you like

Origin blog.csdn.net/u010472858/article/details/94866291