Parameter estimation of loss function based on Python [100011189]

1. Experimental purpose

Understand the logistic regression model and master the parameter estimation algorithm of the logistic regression model.

2. Experimental requirements

To achieve parameter estimation of two loss functions (1. No penalty term; 2. Adding a penalty to the parameter), you can use gradient descent, conjugate gradient or Newton's method, etc.

verify:

  • You can manually generate two separate category data (you can use Gaussian distribution) to verify your algorithm. Investigate what results will be obtained if the class conditional distribution does not satisfy the Naive Bayes assumption.
  • Logistic regression has a wide range of uses, such as advertising forecasting. You can go to the UCI website and find some actual data to test.

3. Experimental content

3.1 Algorithm principle

The essence of our classifier's classification problem is to predict the location label of a known sample, that is, P(Y=1|x < x1, … , xn). According to the Naive Bayesian method, the Bayesian probability formula can be used to convert it into the product of the class conditional probability (likelihood) and the class probability. This experiment is to directly find the probability.

After derivation we can get:

Define the sigmoid function as:

Calculate the loss function as:

Use the gradient descent method to obtain W = argmaxwl(w). Note that if you want to use gradient descent, you generally need to convert the l(w) here into the opposite number, and use -l(w) as the loss function to find its minimum value.

And we add the gradient of the regular term to drop as

3.2 Implementation of Algorithm

The first is to generate data. If you want to generate data whose class conditional distribution satisfies the Naive Bayesian assumption, then use an independent Gaussian distribution for each dimension of each category to generate. If you want to generate data for which the class conditional distribution does not satisfy the Naive Bayesian assumption, then use a two-dimensional Gaussian distribution for each of the two dimensions of each class. It should be noted that due to the characteristics of the Gaussian distribution, the independence of the multidimensional Gaussian distribution can be derived from the uncorrelatedness. Therefore, the data can be generated with a two-dimensional Gaussian distribution. If the Naive Bayesian assumption is satisfied, then the non-correlation of the covariance matrix The diagonal elements are all 0. If the Naive Bayesian assumption is not satisfied, then the off-diagonal elements of the covariance matrix are not 0 (the covariance matrix should be a symmetric matrix).

Compute the maximum likelihood estimate:

Gradient descent algorithm:

When doing the data on UCI, the skin Skin_NonSkin.txt data was selected. Due to the large amount of data, only a part of it is selected here.

When reading data, use numpy slices to extract data information, use 50 as the step size, and extract part of the data for experiments. The sample point needs to be spatially shifted, otherwise it may overflow when calculating MCLE, because when calculating MCLE, it needs to do matrix multiplication with parameters and samples, and it also needs to be calculated as the index of , which may overflow.

4. Experimental results

generate data yourself

The class conditional probability satisfies the Naive Bayesian assumption, the regular term λ=0, size=200

The class conditional probability does not satisfy the Naive Bayes assumption, the regular term λ=0, size=200

The class conditional distribution satisfies the Naive Bayesian assumption, the regular term λ=0.001, size=200

The class conditional probability does not satisfy the Naive Bayes assumption, the regular term λ=0.001, size=200

UCI Skin Color Dataset

Regular term λ=0

Regular term λ=0.01

UCI banknote dataset

Regular term λ=0

Regular term λ=0.01

Experiments have found that the accuracy of the 20% test set of UCI data is basically stable at 93%-94%. The regularization term has little effect on the results when the amount of data is large, and it should be able to effectively solve the overfitting problem when the amount of data is small. The classification performance of class conditional distribution is slightly better when the Naive Bayesian assumption is satisfied than when the assumption is not satisfied. Logistic regression can solve simple linear classification problems very well, and the convergence speed is fast.

♻️ Resources

insert image description here

Size: 1.49MB
➡️ Resource download: https://download.csdn.net/download/s1t16/87547872
Note: If the current article or code violates your rights, please private message the author to delete it!

Guess you like

Origin blog.csdn.net/s1t16/article/details/131321186