手撸LinearRegression

Linear Regression

Analytical solver:

Linear regression analytic form of solutions implemented Python code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24

from numpy import *

def (fileName):
numFeat = len(open(fileName).readline().split('t')) - 1 #get number of fields
dataMat = []; labelMat = []
fr = open(fileName)
for line in fr.readlines():
lineArr =[]
curLine = line.strip().split('t')
for i in range(numFeat):
lineArr.append(float(curLine[i]))
dataMat.append(lineArr)
labelMat.append(float(curLine[-1]))
return dataMat,labelMat

def standRegres(xArr,yArr):
xMat = mat(xArr); yMat = mat(yArr).T
xTx = xMat.T*xMat
if linalg.det(xTx) == 0.0:
print "This matrix is singular, cannot do inverse"
return
ws = xTx.I * (xMat.T*yMat)
return ws

Locally Weighted Linear Regression

Locally weighted linear regression (Locally Weighted Linear Regression, LWLR). In this algorithm, we have to be predicted for each point in the vicinity of the point given a certain weight; then return to normal based on the minimum mean square error on this subset. KNN and as such require advance prediction algorithm for each selected subset of the corresponding data.

The algorithm is a regression coefficient w solution of the following form:

Wherein w is a matrix, for each data point is weighted.

LWLR using the "core" (with support vector machines similar nuclear) be given more weight to the point nearby. The freedom to choose the type of nuclear, nuclear is the most commonly used Gaussian kernel, Gauss should check the weight as follows:

Thus a construct containing only the diagonal elements of the weight matrix W , and the point x and x (i) closer, W (I, I) will be greater. Parameter k determines the parameters given much right to the point in the vicinity of heavy use LWLR is also the only consideration.

Here are the parameters k and weights relationship:

Locally weighted linear regression analytic form of solutions implemented Python code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

def lwlr(testPoint,xArr,yArr,k=1.0):
xMat = mat(xArr); yMat = mat(yArr).T
m = shape(xMat)[0]
weights = mat(eye((m)))
for j in range(m): #next 2 lines create weights matrix
diffMat = testPoint - xMat[j,:] #
weights[j,j] = exp(diffMat*diffMat.T/(-2.0*k**2))
xTx = xMat.T * (weights * xMat)
if linalg.det(xTx) == 0.0:
print "This matrix is singular, cannot do inverse"
return
ws = xTx.I * (xMat.T * (weights * yMat))
return testPoint * ws

def lwlrTest(testArr,xArr,yArr,k=1.0): #loops over all the data points and applies lwlr to each one
m = shape(testArr)[0]
yHat = zeros(m)
for i in range(m):
yHat[i] = lwlr(testArr[i],xArr,yArr,k)
return yHat

局部加权线性回归的问题在于,每次必须在整个数据集上运行,也就是说为了做出预测,必须保存所有的训练数据。

岭回归和逐步线性回归

岭回归最先用来处理特征数多于样本数的情况,现在也用于在估计中加入偏差,从而得到更好的估计。这里通过引入lamda来限制了所有w之和,通过引入该惩罚项,能够减少不重要的参数,这个技术在统计学中也叫做缩减(shrinkage)。

岭回归中的岭是什么?

岭回归使用了单位矩阵乘以常量lamda,最先是用来处理特征数多于样本数的情况,我们观察其中的单位矩阵I,可以看到值1贯穿整个对角线,其余元素全是0。形象地,在0构成的平面上有一条1组成的“岭”,这就是岭回归中的“岭”的由来。

岭回归解析式解Python代码实现

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24

def ridgeRegres(xMat,yMat,lam=0.2):
xTx = xMat.T*xMat
denom = xTx + eye(shape(xMat)[1])*lam
if linalg.det(denom) == 0.0:
print "This matrix is singular, cannot do inverse"
return
ws = denom.I * (xMat.T*yMat)
return ws

def ridgeTest(xArr,yArr):
xMat = mat(xArr); yMat=mat(yArr).T
yMean = mean(yMat,0)
yMat = yMat - yMean #to eliminate X0 take mean off of Y
#regularize X's
xMeans = mean(xMat,0) #calc mean then subtract it off
xVar = var(xMat,0) #calc variance of Xi then divide by it
xMat = (xMat - xMeans)/xVar
numTestPts = 30
wMat = zeros((numTestPts,shape(xMat)[1]))
for i in range(numTestPts):
ws = ridgeRegres(xMat,yMat,exp(i-10))
wMat[i,:]=ws.T
return wMat

岭回归的回归系数变化图

这里的lambda应以指数级变化,这样可以看出lambda在取非常小的值时和取非常大的值时分别对结果造成的影响。

还有一些其他缩减方法,如lasso、LAR、PCA回归以及子集选择等。与岭回归一样,这些方法不仅可以提高预测精确率,而且可以解释回归系数。

在增加如下约束时,普通的最小二乘法回归会得到与岭回归的一样的公式:

上式限定了所有回归系数的平方和不能大于lambda。使用普通的最小二乘法回归在当两个或更多的特征相关时,可能会得出一个很大的正系数和一个很大的负系数。正是因为上述限制条件的存在,使用岭回归可以避免这个问题。

与岭回归类似,另一个缩减方法lasso也对回归系数做了限定,对应的约束条件如下:

唯一的不同点在于,这个约束条件使用绝对值取代了平方和。虽然约束形式只是稍作变化,结果却大相径庭:在lambda足够小的时候,一些系数会因此被迫缩减到0,这个特性可以帮助我们更好地理解数据。这两个约束条件在公式上看起来相差无几,但细微的变化却极大地增加了计算复杂度(为了在这个新的约束条件下解出回归系数,需要使用二次规划算法)。

前向逐步回归

前向逐步回归属于一种贪心算法,即每一步都尽可能减少误差。一开始所有的权重都设为0,然后每一步所做的决策是对某个权重增加或减少一个很小的值。

该算法的伪代码如下所示:

1
2
3
4
5
6
7
8
9
10

数据标准化,使其分布满足0均值和单位方差
在每轮迭代过程中:
设置当前最小误差lowestError为正无穷
对每个特征:
增大或缩小:
改变一个系数得到一个新的W
计算新W下的误差
如果误差Error小于当前最小误差lowestError:设置Wbest等于当前的W
W设置为新的Wbest

前向逐步回归python代码实现

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27

def rssError(yArr,yHatArr): #yArr and yHatArr both need to be arrays
return ((yArr-yHatArr)**2).sum()

def stageWise(xArr,yArr,eps=0.01,numIt=100):
xMat = mat(xArr); yMat=mat(yArr).T
yMean = mean(yMat,0)
yMat = yMat - yMean #can also regularize ys but will get smaller coef
xMat = regularize(xMat)
m,n=shape(xMat)
returnMat = zeros((numIt,n)) #testing code remove
ws = zeros((n,1)); wsTest = ws.copy(); wsMax = ws.copy()
for i in range(numIt):#could change this to while loop
#print ws.T
lowestError = inf;
for j in range(n):
for sign in [-1,1]:
wsTest = ws.copy()
wsTest[j] += eps*sign
yTest = xMat*wsTest
rssE = rssError(yMat.A,yTest.A)
if rssE < lowestError:
lowestError = rssE
wsMax = wsTest
ws = wsMax.copy()
returnMat[i,:]=ws.T
return returnMat

Stepwise linear regression, the main advantage is that it can help people understand the existing model and make improvements. When building a model, you can run the algorithm to identify important features, which makes it possible to stop in time to collect those important features.

When applying reduction methods (such as stepwise linear regression or ridge regression), the model also increases the deviation (bias), but at the same time reducing the variance of the model.

Weigh deviation and variance

Error consists of three parts: deviation error and noise.

Original: Big Box  hand line and LinearRegression


Guess you like

Origin www.cnblogs.com/chinatrump/p/11589128.html