Regression Analysis - Linear Regression

regression analysis

In statistics, regression analysis refers to a statistical analysis method to determine the interdependent quantitative relationship between two or more variables. According to the number of variables involved, regression analysis can be divided into simple regression analysis and multiple regression analysis; according to the number of dependent variables, it can be divided into simple regression analysis and multiple regression analysis; according to the relationship between independent variables and dependent variables, it can be divided into Linear regression analysis and nonlinear regression analysis

What is regression?

Regression (Regression) was first proposed by the British biostatistician Galton and his student Pearson when they were studying the genetic characteristics of parents and children. In 1855, they described in "The Regression of Genetic Height to the Mean" that "the height of children tends to be higher than the average height of their parents, but generally does not exceed the height of their parents", and the concept of regression was first proposed . The current regression analysis has nothing to do with this trend effect, it just refers to the mathematical method derived from Galton's work, using one or more independent variables to predict the dependent variable

The figure is a simple regression model, the X coordinate is the quality, and the Y coordinate is the user satisfaction. It can be seen from the figure that the higher the quality of the product, the better the user evaluation. This can fit a straight line to predict the user satisfaction of the new product Spend

In the regression model, the variable we need to predict is called the dependent variable, such as product quality; the variable selected to explain the change of the dependent variable is called the independent variable, such as user satisfaction. The purpose of regression is to establish a regression equation to predict the target value, and the entire regression solution process is to find the regression coefficient of this regression equation

In short, the simplest definition of regression is:

Given a point set, construct a function to fit the point set, and minimize the error between the point set and the fitting function as much as possible. If the function curve is a straight line, it is called linear regression. If The curve is a cubic curve, which is called cubic polynomial regression

In machine learning, there are six most commonly used regression methods:

  • linear regression

  • polynomial regression

  • Ridge regression

  • Lasso returns

  • elastic regression network

  • logistic regression

This blog mainly introduces linear regression

linear regression

Introduction

Linear regression refers to a regression model that consists entirely of linear variables, such as:

Univariate Linear Regression Model

\[y = a * x + b \]

Multivariate linear regression model:

\[Y = a_1 * x_1 + a_2 * x_2 + …… + a_n * x_n + b \]

其中 a 为系数,x 是变量,b 为偏置。因为这个函数只有线性关系,所以只适用于建模线性可分数据。我们只是使用系数权重来加权每个特征变量的重要性。我们使用随机梯度下降(SGD)来确定这些权重 a 和偏置 b,过程如图所示:

线性回归的几个特点:

  • 建模速度快,不需要很复杂的计算,在数据量大的情况下依然运行速度很快

  • 可以根据系数给出每个变量的理解和解释

  • 对异常值很敏感

怎么求?

这里我们给出了斯坦福大学机器学习公开课线性回归的例子,假设存在表1的数据集,它是某企业的成本和利润数据集。数据集中 2002 年到 2016 年的数据集称为训练集,整个训练集共 15 个样本数据。重点是成本和利润两个变量,成本是输入变量或一个特征,利润是输出变量或目标变量,整个回归模型如下图所示:

现建立模型,x 表示企业成本,y 表示企业利润,h(Hypothesis)表示将输入变量映射到输出变量 y 的函数,对应一个因变量的线性回归(单变量线性回归)公式如下:

\[h_{\theta}(x) = \theta_0 + \theta_1x \]

那么,现在要解决的问题是如何求解的两个参数和。我们的构想是选取的参数和使得函数尽可能接近 y 值,这里提出了求训练集(x,y)的平方误差函数(Squared Error Function)或最小二乘法

在回归方程里,最小化误差平方和方法是求特征对应回归系数的最佳方法。误差是指预测 y 值和真实 y 值之间的差值,使用误差的简单累加将使得正差值和负差值相互抵消,所采用的平方误差(最小二乘法)

\[\sum^m_{i = 1} (h_\theta(x_i) - y_i)^2 \]

选择适当的参数让其最小化 min,即可实现拟合求解过程。通过上面的这个示例,我们就可以对线性回归模型进行如下定义:根据样本 x 和 y 的坐标,去预估函数 h,寻求变量之间近似的函数关系:

\[h_{\theta}(x) = \theta_0 + \theta_1x_1 + …… + \theta_nx_n = \sum^n_{i = 0}(\theta_ix_i) \]

其中,n 表示特征数目,表示每个训练样本的第i个特种值,当只有一个因变量 x 时,称为一元线性回归,类似于;而当多个因变量时,成为多元线性回归。我们的目的是使最小化,从而最好的将样本数据集进行拟合,更好地预测新的数据

参考资料

机器学习中的几种回归方法总结

[Python从零到壹] 十二.机器学习之回归分析万字总结全网首发(线性回归、多项式回归、逻辑回归)

Guess you like

Origin blog.csdn.net/m0_59161987/article/details/129484194