R language: glmnet package key details and multi-class regression implementation (lasso/ridge regression/elastic net)

1.1 Introduction to Glmnet

Glmnet is a package for fitting generalized linear and similarity models via penalized maximum likelihood. The parameter controlling the computation of lasso regression or elastic net regression on a logarithmic scale is the regularization parameter lambda. This algorithm is very fast and can take advantage of the sparsity of the input matrix x.

It fits linear, logistic, and polynomial, Poisson, etc. regression models. It can also fit multiple linear regression models, custom family generalized linear regression models, and lasso regression models. The package also has methods for prediction, plotting, and cross-validation.

1.2 Glmnet mathematical representation

GlmnetSpecify the equation:
min β 0 , β 1 N ∑ i = 1 N wil ( yi , β 0 + β T xi ) + λ [ ( 1 − α ) ∣ ∣ β ∣ ∣ 2 2 / 2 + α ∣ ∣ β ∣ ∣ 1 ] min_{\beta_0,\beta}\frac{1}{N}\sum_{i=1}^{N}w_{i}l(y_{i},\beta_0+\beta^{ T}x_{i})+\lambda [(1-\alpha )||\beta ||_{2}^{2}/2+\alpha ||\beta ||_{1}]minb0, bN1i=1Nwil ( yi,b0+bTxi)+l [( 1a ) ∣∣ b 22/2+α∣∣β1]

Cover the entire range of possible solutions on a grid of lambda values. Here the l(y i , η i ) function is the negative log-likelihood estimate of observation i, eg, in the Gaussian case, it is 12*(y i −η i ) 2 .

1.3 Comparison of Glmnet multiple regression methods

Elastic net regression is controlled by α, which bridges the gap between lasso regression (α=1, default) and ridge regression (α=0). The parameter λ controls the overall strength of the penalty.

a return
α=1, default value lasso regression
α=0 ridge ridge regression
α=(0,1) elastic net regression

As we all know, ridge regression shrinks the coefficients of interrelated predictors, while lasso regression tends to select some of them and discard others, that is, lasso regression has the effect of selecting variables.

Elastic net regression combines the strengths and weaknesses of both: α = 0.5 tends to select or ignore features across groups if predictors are correlated across groups. This is a higher-level parameter, and the user may pre-select an alpha value or try several different alpha values. One use of the alpha parameter in the function is for numerical stability; for example, as alpha gets closer to 1, elastic net will be more similar to lasso regression, but eliminates cases where extreme data make the model too simple.

1.4 Glmnet code principle

The glmnet algorithm uses the cyclic coordinate descent method to optimize the objective function sequentially while other parameters remain unchanged, and repeat the cycle until convergence. This R package also utilizes strong rules to effectively constrain the active set. The algorithm can be computed very quickly due to efficient updates and techniques such as warm start and active set convergence.

The code can handle sparse input matrix formats, as well as range restrictions on coefficients. The core code of glmnet is a set of Fortran subroutines, which makes its execution very fast.

1.5 Glmnet installation and loading

Like all other R packages, installing glmnet requires only one line of code.

install.packages("glmnet")

Setting it as a domestic mirror website in R Studio can speed up the installation of the package.

library(glmnet)
1.6 Glmnet regression use

Load the data package that comes with R for Glmnet display

data("MultinomialExample")

In the default case where we don't adjust the parameters, it uses a Gaussian distribution or least squares for regression. First determine the regression method to be used,

And we first need to determine the regression method to use

Such as Lasso regression, Ridge regression, Elastic net, etc.

fit = glmnet(data_x, data_y,alpha =0.5,standardize=TRUE)

alpha=0 Ridge regression does not perform variable selection

alpha=1 lasso regression for variable selection

Other values ​​between [0,1] are Elastic net regression.

fit = glmnet(MultinomialExample$x,MultinomialExample$y,alpha = 0.5,standardize=TRUE)

standardize to standardize to avoid dimension influence.

Fit is an object of the glmnet class that contains all relevant information about the fitted model for further use. It is not necessary to use this object directly for analysis. Various methods are provided on the fit object in the Glmnet package, such as plot, print, coef, and predict, allowing us to perform these tasks more efficiently.

1.7 Analysis of Glmnet regression results

print fit result

print(fit)
> #打印拟合结果
> print(fit)

Call:  glmnet(x = MultinomialExample$x, y = MultinomialExample$y, alpha = 0,      standardize = TRUE) 

    Df  %Dev  Lambda
1   30  0.00 267.300
2   30  0.28 243.600
3   30  0.30 221.900
4   30  0.33 202.200
5   30  0.37 184.200
6   30  0.40 167.900
7   30  0.44 153.000
8   30  0.48 139.400
9   30  0.53 127.000
10  30  0.58 115.700
11  30  0.64 105.400
12  30  0.70  96.060
13  30  0.76  87.530
14  30  0.84  79.750
15  30  0.92  72.670
16  30  1.00  66.210
17  30  1.10  60.330
18  30  1.20  54.970
19  30  1.32  50.090
20  30  1.44  45.640
21  30  1.58  41.580
22  30  1.73  37.890
23  30  1.89  34.520
24  30  2.07  31.460
25  30  2.26  28.660
26  30  2.47  26.120
27  30  2.69  23.800
28  30  2.94  21.680
29  30  3.21  19.760
30  30  3.50  18.000
31  30  3.81  16.400
32  30  4.15  14.940
33  30  4.52  13.620
34  30  4.92  12.410
35  30  5.34  11.300
36  30  5.80  10.300
37  30  6.29   9.385
38  30  6.82   8.552
39  30  7.38   7.792
40  30  7.98   7.100
41  30  8.62   6.469
42  30  9.29   5.894
43  30 10.01   5.371
44  30 10.76   4.893
45  30 11.55   4.459
46  30 12.38   4.063
47  30 13.25   3.702
48  30 14.15   3.373
49  30 15.08   3.073
50  30 16.04   2.800
  • It shows, from left to right, the number of non-zero coefficients (Df), the explained percentage (null) deviation (%Dev) and the value of lambda (Lambda).
  • glmnet by default fits a model with 100 values ​​of lambda.
  • If %Dev doesn't change by a sufficient size from one lambda to the next, it considers the fit to be complete and stops early to improve computation speed.
  • For brevity, we have truncated the printouts here to show only parts.
1.8 Visualization of Glmnet regression results

We can visualize the regression results with plot()

plot(fit)

elastic net:

insert image description here

Lasso returns:

insert image description here

Each curve corresponds to a variable. It shows the path of its coefficients with respect to the ℓ1 norm of the entire coefficient vector as λ varies. The upper axis represents the number of non-zero coefficients at the current λ, which is the lasso's effective degrees of freedom (df). Users may also wish to annotate the curves: this can be achieved by setting label=TRUE in the plot command.

Ridge regression:

insert image description here

It can be clearly observed that the variables under the ridge regression are always greater than 0, that is, the explanatory variables have not been screened out; in contrast, when the lambda changes in the elastic net and lasso regression, some variables disappear, and the model interpretation is no longer performed.

We can get the model coefficients at one or more λ within the sequence range through the following code:

Get the parameter estimation results when hyperparameter = 0.1

coef(fit,s=0.1)
1.9 Glmnet model evaluation method

The function glmnet returns a list of models for the user to choose from. In many cases, users may prefer software to choose one of them. Cross-validation is probably the easiest and most widely used method for this task. CV. Glmnet is the main function here for cross-validation, along with various supporting methods such as plotting and predicting.

Ten-fold cross-validation for model evaluation:

cvfit <- cv.glmnet(data_x, data_y,nfolds = 10) 

cv.glmnet returns a cv.glmnet object, a list containing all components that cross-validate matches. As with glmnet, direct use of this object is discouraged, instead utilizing functions designed by its package.

Plot showing the effect of hyperparameters:

plot(cvfit)

insert image description here

This will plot the cross-validation curve (dashed red line) along with the upper and lower standard deviation curves along the lambda series (error bars). Two special values ​​along the lambda sequence are indicated by vertical dashed lines.

1.10 Glmnet selects the best model

Print the best hyperparameters to help the user choose:

The model cross-validation error under lambda.min is the smallest. lambda.1se gives the most regularized model such that the cross-validation error is within one standard error of the minimum.

print(cvfit$lambda.min) 
print(cvfit$lambda.1se)
> print(cvfit$lambda.min) 
[1] 0.02866132
> print(cvfit$lambda.1se)
[1] 0.07266689
1.11 Glmnet prediction

Set the seed to randomly generate new data

set.seed(29)
nx <- matrix(rnorm(500 * 30), 500, 30)

Prediction through the trained model

predict(fit, newx = nx, s = c(0.1, 0.05))
> predict(fit, newx = nx, s = c(0.1, 0.05))
              s1        s2
  [1,] 2.7309311 2.7732605
  [2,] 2.2859731 2.2954917
  [3,] 2.2992034 2.3208074
  [4,] 1.6894255 1.6735020
  [5,] 2.5623190 2.5941302
  [6,] 1.5733710 1.5388561

Guess you like

Origin blog.csdn.net/yt266666/article/details/127377217