Polynomial fitting sine function based on Python [100011190]

1. Experimental purpose

Master the solution of the least squares method (loss function without penalty term), master the optimization of loss function with penalty term (2 norm), gradient descent method, conjugate gradient method, understand overfitting, and overcome overfitting methods (such as Add penalty term, increase sample)

2. Experimental requirements

  • Generate data and add noise;

  • Fitting curves with higher order polynomial functions;

  • Use the analytical solution to solve the optimal solution of two kinds of loss (without regular term and with regular term)

  • Optimization methods to find the optimal solution (gradient descent, conjugate gradient);

  • Using the experimental data you got, explain the overfitting.

  • Use different data volumes, different hyperparameters, and different polynomial orders to compare the experimental results.

  • The language is not limited, you can use matlab, python. The ready-made matrix inversion can be utilized when solving the analytical solution. Gradient descent and conjugate gradient require you to find the gradient yourself, and write iterative optimization yourself. Off-the-shelf platforms such as pytorch and tensorflow's automatic differentiation tools are not allowed.

3. Experimental content

3.1 Algorithm principle

This experiment requires a polynomial to fit a sine function. In an m-order polynomial, there are m+1 undetermined coefficients, and the (column) vector composed of m+1 coefficients (from low to high) is denoted as w. To determine w, the method of least squares is used.

Let E(w) = 1/2 * (Xw – Y)^T(Xw – Y), where X is the matrix obtained by substituting each unknown item in the polynomial into the observed data, if Xi is the i-th row of X vector, then Xi[j] is the j-th power of the i-th observation data xi, and n groups of observation data are recorded, the highest polynomial degree is m, and it is easy to know that the dimension of X is n * (m+1). Y is the observation label vector. That is, Y[j] is the label value (ie y value) of the jth group of observed data. Therefore, the problem is transformed into: finding the vector w such that E(w) is the smallest.

  • If no regularization term is added, the derivative of the loss function is set to zero, and w
  • If the regular term is added, the derivative of the loss function is set to zero, and w
  • Add a regular term, use gradient descent for the loss function, and when the loss function converges, find w
  • Add a regular term, use the conjugate gradient method for the loss function, loop iterate m+1 times, and find w

3.2 Algorithm implementation

  • Generate data, add noise

  • Fitting curves with higher order polynomial functions;

  • Use the analytical solution to solve the optimal solution of two kinds of loss (without regular term and with regular term)

No regex:

There are regular terms:

  • Optimization methods to find the optimal solution (gradient descent, conjugate gradient)

Gradient descent method:

Conjugate gradient method:

  • Using the experimental data you got, explain the overfitting.

When polynomial degree is 1:

When polynomial degree is 3:

When the polynomial degree is 5:

When the polynomial degree is 7:

When the polynomial degree is 9:

It can be seen that the higher the degree of polynomial is not, the better the fitting effect will be.

If the number of times is too small, it cannot be fitted correctly. When the number of times is 5, the fitting effect is better, but when the number of polynomials is too large, the fitting effect will be worse, that is, over-fitting occurs. Overfitting is caused by too small a sample size and too strong a model. At this time, the fitting effect will be very poor if optimization processing such as adding regularization items is not adopted.

  • Use different data volumes, different hyperparameters, and different polynomial orders to compare the experimental results.

When the amount of data is 20, the degree of polynomial is 9, and the hyperparameter is 0.0001:

When the amount of data is 30, the degree of polynomial is 9, and the hyperparameter is 0.0001:

When the amount of data is 40, the degree of polynomial is 9, and the hyperparameter is 0.0001:

It can be seen that the more sample data under the same conditions, the better the fitting effect

When the amount of data is 40, the degree of polynomial is 9, and the hyperparameter is 0.01:

When the amount of data is 40, the degree of polynomial is 9, and the hyperparameter is 0.001:

When the amount of data is 40, the degree of polynomial is 9, and the hyperparameter is 0.0001:

It can be seen that the hyperparameters have very limited influence on the fitting effect

♻️ Resources

insert image description here

Size: 2.35MB
➡️ Resource download: https://download.csdn.net/download/s1t16/87547874
Note: If the current article or code violates your rights, please private message the author to delete it!

Guess you like

Origin blog.csdn.net/s1t16/article/details/131322375