Multiple regression is an extension of linear regression to the relationship between more than two variables. In a simple linear relationship, we have one predictor and one response variable, but in multiple regression, we have multiple predictors and one response variable.
The general mathematical equation for multiple regression is −
y = a + b1x1 + b2x2 +...bnxn
Following is the description of the parameters used −
y is the response variable.
a, b1, b2...bn are the coefficients.
x1, x2, ... xn are predictors.
We use the lm() function in R language to create regression models. The model uses the input data to determine the values of the coefficients. Next, we can use these coefficients to predict the value of the response variable for a given set of predictors.
lm() function
This function creates a model of the relationship between the predictor and response variables.
grammar
The basic syntax of the lm() function in multiple regression is −
lm(y ~ x1+x2+x3...,data)
Following is the description of the parameters used −
A formula is a symbol that represents the relationship between the response variable and the predictor variable.
Data is a vector to which formulas are applied.
example
Input data
Consider the dataset "mtcars" available in the R locale. It gives a comparison between different car models for miles per gallon (mpg), cylinder displacement ("disp"), horsepower ("hp"), car weight ("wt") and some other parameters.
The goal of the model is to establish the relationship between "mpg" as the response variable and "disp", "hp" and "wt" as the predictor variables. To do this, we create subsets of these variables from the mtcars dataset.
input <- mtcars[,c("mpg","disp","hp","wt")]
print(head(input))
When we execute the above code, it produces the following result −
mpg disp hp wt
Mazda RX4 21.0 160 110 2.620
Mazda RX4 Wag 21.0 160 110 2.875
Datsun 710 22.8 108 93 2.320
Hornet 4 Drive 21.4 258 110 3.215
Hornet Sportabout 18.7 360 175 3.440
Valiant 18.1 225 105 3.460
Create a relational model and get coefficients
input <- mtcars[,c("mpg","disp","hp","wt")]
# Create the relationship model.
model <- lm(mpg~disp+hp+wt, data = input)
# Show the model.
print(model)
# Get the Intercept and coefficients as vector elements.
cat("# # # # The Coefficient Values # # # ","
")
a <- coef(model)[1]
print(a)
Xdisp <- coef(model)[2]
Xhp <- coef(model)[3]
Xwt <- coef(model)[4]
print(Xdisp)
print(Xhp)
print(Xwt)
When we execute the above code, it produces the following result −
Call:
lm(formula = mpg ~ disp + hp + wt, data = input)
Coefficients:
(Intercept) disp hp wt
37.105505 -0.000937 -0.031157 -3.800891
# # # # The Coefficient Values # # #
(Intercept)
37.10551
disp
-0.0009370091
hp
-0.03115655
wt
-3.800891
Equations for creating a regression model
Based on the above intercept and coefficient values, we create mathematical equations.
Y = a+Xdisp.x1+Xhp.x2+Xwt.x3
or
Y = 37.15+(-0.000937)*x1+(-0.0311)*x2+(-3.8008)*x3
Apply equations to predict new values
When provided with a new set of displacement, horsepower and weight values, we can use the regression equation created above to predict mileage.
For a car with disp = 221, hp = 102 and wt = 2.91, the predicted mileage is -
Y = 37.15+(-0.000937)*221+(-0.0311)*102+(-3.8008)*2.91 = 22.7104