Local weighted linear regression (with practical code python)

1 Introduction:

    We know that a problem with linear regression is underfitting, which will not achieve good prediction results. Because it is an unbiased estimator with minimal mean squared error. The solution to this problem is to allow some bias in the estimates. One very effective method is locally weighted linear regression (LWLR).

2. Algorithm idea:

    2.1. Comparing Linear Regression:

            

    2.2. Local weighted linear regression: (using Gaussian kernel weights )

            

            

                explain:

                        When the sample point is   close to the prediction point   , the weight is larger. 

                        When the sample point is  far from the prediction point  , the weight is small. 

                            The weight coefficient  decays exponentially, where the parameter   is the decay factor, which is the rate at which the weight decays.

                                  Smaller weights decay faster. (It is easy to know according to the weight function)

                                 From the actual combat, we can better understand the parameters 

      Least squares method to find the best regression coefficients.

                

                where W is the weight matrix (diagonal):

                        

                  Solving: (The detailed process is similar to the linear regression solving process )

                  

3. Actual combat: (with complete project code )

        3.1. Core code: local linear regression function

def lwlr(testPoint,xArr,yArr,k=1.0):
    xMat = np.mat(xArr); yMat = np.mat(yArr).T
    m = np.shape(xMat)[0]
    weights = np.mat(np.eye((m)))
    for j in range(m):                      #next 2 lines create weights matrix
        diffMat = testPoint - xMat[j,:]   #difference matrix
        weights[j,j] = np.exp(diffMat*diffMat.T/(-2.0*k**2))   #weighted matrix
    xTx = xMat.T * (weights * xMat)      
    if np.linalg.det(xTx) == 0.0:
        print ("This matrix is singular, cannot do inverse")
        return
    ws = xTx.I * (xMat.T * (weights * yMat))   #normal equation
    return testPoint * w

        3.2. Results:

             When   = 1, the results are similar to linear regression underfitting

            

            When   = 0.01 the result is as follows

            

            when   = 0.003 overfitting

            

    From the results, local linear regression can solve the problem of under-fitting of linear regression very well, but over-fitting may occur.

So the parameter adjustment affects the generalization ability of the model. Choosing the right parameters is crucial.

    Although local linear regression can enhance the generalization ability of the model. But it also has its own flaws. That is, the prediction for each point must use the entire dataset. This greatly increases the amount of computation.


    4. There is a problem:

    Consider a problem, when the data features are more than the training set sample points, that is to say irreversible, the matrix derivation can't do anything. At this time, it is necessary to use reduced samples to "understand" the data and obtain the regression coefficient matrix.


Now we know about linear regression and locally weighted linear regression to predict the age of abalone.

Sample Code: Predicting Abalone Age

    

            


Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325341603&siteId=291194637