Getting Started with Python Machine Learning - Logistic Regression Study Notes


1. Introduction to Logistic Regression

insert image description here

In machine learning, we need a large amount of sample data to train the model to make the effect of the model better and better. Neural networks and deep learning are typical models that require a large amount of sample data to achieve good results. Less cases will be greatly reduced, so we sometimes need some simple models, such as logistic regression, which can perform relatively well when the sample size is small.

Although Logistic Regression is called logistic regression, it is actually a classification model and is often used for binary classification. The essence of Logistic regression is to assume that the data obeys this distribution, and then use the maximum likelihood estimation to estimate the parameters.


Second, the mathematical principle of logistic regression

The logistic regression model sounds like a model for solving regression problems, but in fact it is a classification model that uses the principle of regression for binary classification, so it has many similarities with linear regression in terms of mathematical principles.

insert image description here

1. Sigmoid function

S i g m o i d Sigmoid The S ig mod function is one of the commonly used activation functions, which is used to perform nonlinear transformation on sample data . Sigmoid SigmoidThe S i g m o i d function is a classic S-type curve, which is widely used in engineering, statistics and other fields. It is also a common S-type function in biology, also known as the S-type growth curve; in information science, due to its Single increase and inverse function single increase and other properties,S igmoid SigmoidThe Sigmoid function is often used as the activation function of the neural network, but as the activation function of the neural network, it also has its disadvantages, that is, the gradient disappears on both sides .

insert image description here

In a logistic regression model, we pass the sample data through Sigmoid SigmoidS i g m o i d function maps to the interval[ 0 , 1 ] [0,1][0,1 ] , and the value of the probability is just in[ 0 , 1 ] [0,1][0,1 ] , so we can convert the results of regression prediction into probability values ​​to realize the binary classification of sample data. In mathematics,Sigmoid SigmoidS i g m o i d function is about point( 0 , 1 2 ) (0, \frac{1}{2})(0,21) centrally symmetric function, its trend also conforms to the objective data distribution, that is, the closer to the mean, the denser the data distribution, and the farther away from the mean, the sparser the data distribution.

2. Transformation of predictive regression and classification

The mathematical principles of logistic regression and linear regression have many similarities. They both perform matrix operations on features and their labels, and then use the likelihood function to construct the loss function, and use the gradient descent method to perform parameter update iterations to seek the minimum loss.

insert image description here

The difference is that logistic regression will first put the prediction function into Sigmoid SigmoidThe S i g m o i d function performs a non-linear transformation, converting the product of features and weight parameters into probabilities, so the elements in the prediction function are probabilities, and the labels will correspondingly become the logical numbers 0 and 1 of the two classifications

h θ ( x ) = 1 1 + e − θ T x h_{\theta}(x) = \frac{1}{1 + e^{-\theta^{T}x}} hi(x)=1+eiTx1

Then transform the prediction function into a probability function, which describes whether your prediction value is more likely to be positive or negative, and the sum of the probabilities of positive and negative is 1, so we will successfully return The problem is transformed into a classification problem, that is, whether this data point belongs to this category, or not, it is another category. Finally, we integrate the probability function into an expression for easy calculation

P ( y ∣ x ; θ ) = ( h θ ( x ) ) y ( 1 − h θ ( x ) ) 1 − y P(y | x;\theta) = (h_{\theta}(x))^ {y}(1-h_{\theta}(x))^{1-y}P(yx;i )=(hi(x))y(1hi(x))1y

3. Likelihood function

We have roughly described the role of the likelihood function before. In logistic regression, we use the likelihood function to estimate the probability most likely, and find out what weight parameter θ \ thetaIn θ, we can maximize the approximation to the ideal result, that is, the loss of the classification result in the given label is minimized

insert image description here

In order to facilitate the calculation, we can convert the probability product into a probability sum after logarithmic operation of the likelihood function. In order to construct the minimum loss function of the classification result, we need to use the gradient ascent module to solve the maximum solution of the logarithmic likelihood function, and add a negative sign to convert it to use the gradient descent module to solve the minimum solution and the stagnation point value of the minimum solution. The stagnation value is a matrix of weight parameters

4. Partial derivative and parameter update

insert image description here

The next step is to log-likelihood function to θ \thetaθ partial derivative, this process is actually performing a derivative operation on the matrix. After calculating the partial derivative, we can get the mathematical expression of the gradient
∇ θ j J ( θ ) = − 1 m ∑ i = 1 m ( yi − 1 1 + e − θ T x ) xij \nabla_{\theta_{j }} J(\theta) = -\frac{1}{m} \sum_{i=1}^m (y_{i} - \frac{1}{1 + e^{-\theta^{T} x}})x_{i}^{j}ijJ(θ)=m1i=1m(yi1+eiTx1)xij

insert image description here

Finally, we follow the idea of ​​gradient descent algorithm to weight parameter θ \thetaθ is updated, whereα \alphaα is the learning rate. In general, we use the method of small batch gradient descent to update the parameter value. Finally, when the loss no longer changes significantly, we can obtain our stagnation value, that is, the weight parameter matrix, and we can draw final classification decision boundary

5. S o f t m a x Softmax S o f t max multiple classification

insert image description here

The general logistic regression model is to solve the binary classification problem. If you encounter a multi-classification problem, you need to take out one of the N categories, and the remaining categories are unified into one category, which is regarded as a binary classification problem; and then replaced with another category. , and the rest are classified into one category, and traversed in this way, the multi-category classification is converted into many binary classification problems. softmax softmaxso f t max x multi-classification is a model based on logistic regression, input a vector, throughsoftmax softmaxThe so f t max formula is mapped to obtain a probability vector, and finally it is classified into the category with the highest calculated probability .

insert image description here
After we input the sample data, exe^{x}eAfter x acts, its proportion is mapped into a probability distribution, and the softmax softmaxof each feature data is obtainedso f t max calculates the probability, where each label is also transformed from the original logical numbers 0 and 1 into multiple classification labels

insert image description here

Finally, cross entropy is used to express softmax softmaxso f t max x multi-category loss function, cross entropy describes the calculation probability that the sample data is mapped into, the closer the probability of the sample data belonging to a certain class is to 1, the smaller the entropy, the smaller the degree of confusion, the loss The smaller the value of the function; on the contrary, the closer the probability that the sample data belongs to a certain class is closer to 0, the greater the entropy, the greater the degree of confusion, and the greater the value of the loss function.
In fact, it’s like when you go to a department store to buy things, the more accurate the product classification in a certain area, that is, the lower the degree of confusion, the smaller the loss for customers who buy wrong; the more messy the product classification in a certain area, that is, the higher the degree of confusion , the greater the loss for customers who buy the wrong one.

3. Python implements logistic regression and softamx softamxso f t am x multi-category

To implement logistic regression with Python, we must first import several commonly used packages numpy, matplotlib, and sklearn. Then we select the dataset of petal length characteristics in the mandarin duck flower dataset to distinguish flower types. Because it is a binary classification problem, only those belonging to this The probability value of planting a flower and not belonging to this kind of flower, then instantiating logistic regression and training the model, then generating a test data set to predict the probability, and finally doing a visual display of the data

import numpy as np
from sklearn import datasets
import matplotlib.pyplot as plt
from sklearn.linear_model import LogisticRegression

iris = datasets.load_iris()
x = iris['data'][:, 3:]
y = (iris['target'] == 2).astype(np.int_)
log_res = LogisticRegression()
log_res.fit(x, y)

x_new = np.linspace(0, 3.3, 200).reshape(-1, 1)
y_porba = log_res.predict_proba(x_new)
print(y_porba)

plt.figure(figsize=(10, 8))
decision_boundary = x_new[y_porba[:, 1] >= 0.5][0]

plt.plot([decision_boundary, decision_boundary], [-0.2, 1.2], 'k:', linewidth=2)
plt.plot(x_new, y_porba[:, 0], 'g-', label='Iris-Virginica')
plt.plot(x_new, y_porba[:, 1], 'b--', label='Not Iris-Virginica')
plt.text(decision_boundary, 0.15, 'Decision Boundary', fontsize=12, color='k', ha='center')
plt.arrow(decision_boundary[0], 0.03, -0.3, 0, head_width=0.05, head_length=0.10, color='g')
plt.arrow(decision_boundary[0], 0.97, 0.3, 0, head_width=0.05, head_length=0.10, color='b')
plt.legend()
plt.show()

insert image description here

Implement softamx softamx in Pythonso f t am x multi-category must first import several commonly used packages numpy, matplotlib, sklearn, and then we select the data set of petal length and width in the mandarin duck flower data set to distinguish the three types of flowers, because it is A three-category problem, so each data point will generate three probability values, and then instantiate logistic regression to specify the parameters as multi-category and train the model, then generate a test data set to predict the probability and decision boundary, and finally make the data visualization display, the data Visualization requires the use of checkerboard patterns, which mainly draw scatter distributions, decision boundaries and probability value contours

import numpy as np
from sklearn import datasets
import matplotlib.pyplot as plt
from sklearn.linear_model import LogisticRegression
from matplotlib.colors import ListedColormap

iris = datasets.load_iris()
x = iris['data'][:, (2, 3)]
y = iris['target']
softmax_reg = LogisticRegression(multi_class='multinomial', solver='lbfgs')
softmax_reg.fit(x, y)

x0, x1 = np.meshgrid(np.linspace(0, 10, 600).reshape(-1, 1), np.linspace(0, 6, 350).reshape(-1, 1), )
x_new = np.c_[x0.ravel(), x1.ravel()]
y_proba = softmax_reg.predict_proba(x_new)
y_predict = softmax_reg.predict(x_new)
zzl = y_proba[:, 1].reshape(x0.shape)
zz = y_predict.reshape(x0.shape)

plt.figure(figsize=(10, 8))
plt.plot(x[y==2, 0], x[y==2, 1], 'g^', label='Iris-Virginica')
plt.plot(x[y==1, 0], x[y==1, 1], 'bs', label='Iris-Versicolor')
plt.plot(x[y==0, 0], x[y==0, 1], 'yo', label='Iris-Setosa')
custom_cmap = ListedColormap(['#fafab0', '#9898ff', '#a0faa0'])
plt.contourf(x0, x1, zz, cmap=custom_cmap)
contour = plt.contour(x0, x1, zzl, cmap=plt.cm.brg)
plt.clabel(contour, inline=1, fontsize=10)
plt.xlabel('Petal Length', fontsize=8)
plt.ylabel('Petal Width', fontsize=8)
plt.legend(loc='center left')
plt.axis([0, 8, 0, 5])
plt.show()

insert image description here


Summarize

The above is the study note of the logistic regression model. This note briefly introduces the mathematical principle of logistic regression and the idea of ​​​​Python implementation. Logistic regression can be regarded as a relatively simple and easy-to-use model in supervised classification problems, softamx softamxso f t am x is more like a single-layer neural network, softamx softamxwhen the sample data is relatively smallso f t am x has even greater advantages. After all, it is still the same principle. If the model can be simplified as much as possible, it will be as simple as possible. If the effect is really bad, then add a complex model

おすすめ

転載: blog.csdn.net/m0_55202222/article/details/129482906
おすすめ