1. Classification
2. Hypothesis statement
sigmoid function = logistic function
Assuming the representation method of the function:
3. Decision boundary
The decision boundary is the attribute of the hypothesis function, which depends on the hypothesis function itself and its parameters θ , rather than the attributes of the data set. The training set is used to fit the parameter θ.
4. Cost function
In logistic regression, if a square cost function is used, it is likely that the cost function appears as a non-convex function due to the non-linearity of the sigmoid function included in the hypothesis function, so it is difficult to find the global optimal value . Therefore, we need to replace another cost function so that the cost function appears as a convex function (convex), so that the global optimal value can be found using the gradient descent algorithm .
5. Simplify the cost function and gradient descent
The cost function in the figure above is obtained from statistics using the maximum likelihood method, which can quickly find parameters for different models, and is a convex function, so most people use it to fit a logistic regression model Cost function.
The cost functions of linear regression and logistic regression seem to be the same, but in fact, their hypothetical functions are different, so special attention is needed here. In linear regression, the hypothesis function is θ transpose *x, while in logistic regression, the hypothesis function includes the sigmoid function.
6. Advanced optimization
More advanced optimization algorithms, such as Conjugate gradient, BFGS, L-BFGS, have advantages over gradient descent algorithms in that they do not need to manually select the learning rate, and can run faster, but these algorithms will be more complicated to understand some.
7. Muti-class classification: one to many
Muti-class classification: one-vs-all