Logistic regression
Logistic regression is similar to linear regression, but the result is binary. It is used to transform a variety of problems into the problem can be linear model.
Concepts / terms
- Logistic function
a can the class belongs to a probability map to ± ∞ range (between 0 and 1 instead) function. (Note that not the final ratio)
Logistic function = logarithmic probability function - Probability of
"success" ratio between (1) and "unsuccessful" (0). - Results variables: label probability p 1 is (instead of a simple binary label)
outcome variable variable y =
Assume functions
Modeling process
First of all, we can not simply be seen as two yuan outcome variable label, the label should be regarded as the probability p 1.
If the direct modeling, does not ensure the probability p is within [0, 1]:
The following in a different approach. : P us to model the response function (inverse of the logistic regression function) by applying logic predictors
This conversion ensures that the value of p is within [0, 1].
Note:
logarithmic ends of the equation to give:
Logarithmic probability function , also known as Logistic functions.
Upon completion of this conversion process, we can use a linear model to predict the probability.
Logistic regression models
- Logistic regression models
- Logical Functions
- Logistic regression model to predict
Loss function
- Loss of function of a single training samples
- Logistic regression cost function (log loss function)
optimization
Logistic regression cost function is not known closed equation (equation does not exist a standard equivalent equation) to calculate the minimum cost function value of θ. However, this is a convex function, so by gradient descent (or any other optimization algorithms) guaranteed to find the global minimum
Code examples
import matplotlib.pyplot as plt
import numpy as np
from sklearn import datasets
from sklearn.linear_model import LogisticRegression
iris = datasets.load_iris()
list(iris.keys())
# 我们试试仅基于花瓣宽度这一个特征,创建一个分类器来检测Virginica鸢尾花。
X = iris["data"][:, 3:] # petal width
y = (iris["target"] == 2).astype(np.int) # 1 if Iris-Virginica, else 0
log_reg = LogisticRegression()
log_reg.fit(X, y)
X_new = np.linspace(0, 3, 1000).reshape(-1, 1)
y_proba = log_reg.predict_proba(X_new)
y_proba
plt.scatter(X, y)
plt.plot(X_new, y_proba)
# plt.plot(X_new, y_proba[:, 1], "g-", label="Iris-Virginica")
# plt.plot(X_new, y_proba[:, 0], "b--", label="Not Iris-Virginica")
Note that there are partially overlapped. At about 1.6 cm there is a decision boundary, where "yes" and the possibility of "no" were 50%