Logistic Regression逻辑回归-sigmoid-Gradient ascent梯度上升

The probability of an event is a number describing the chance that the event will happen. An event that is certain to happen has a probability of 1. An event that cannot possibly happen has a probability of zero. If there is a chance that an event will happen, then its probability is between zero and 1.

X: event

x: result

If p is a probability, then p/(1 − p) is the corresponding odds;

Then Z= ln(odds) = ln p / (1-p) 

         e^Z = p/(1-p)

 (1-p) e^Z = p

          e^Z = (e^Z+1)p

              p = e^Z / (1+e^Z) = 1 / (1+e ^(-Z) )     

, This function is called the sigmoid.

At 0 the value of the sigmoid is 0.5. For increasing values of x, the sigmoid will approach 1, and for decreasing values of x, the
sigmoid will approach 0. On a large enough scale (the bottom frame of figure 5.1), the sigmoid looks like a step function.

For the logistic regression classifier we’ll take our features and multiply each one by a weight and then add them up. This result will be put into the sigmoid, and we’ll get a number between 0 and 1. Anything above 0.5 we’ll classify as a 1, and anything below 0.5 we’ll classify as a 0. You can also think of logistic regression as a probability estimate.

The input to the sigmoid function described will be z, where z is given by the following:

In vector notation we can write this as , All that means is that we have two vectors of numbers and we’ll multiply each element and add them up to get one number.

Gradient ascent

The first optimization algorithm we’re going to look at is called gradient ascent. Gradient ascent is based on the idea that if we want to find the maximum point on a function, then the best way to move is in the direction of the gradient. We write the gradient  with the symbol and the gradient of a function f(x,y) is given by the equation

this gradient means that we’ll move in the x direction by amount and in the y direction by amount . The function f(x,y) needs to be defined and differentiable around the points where it’s being evaluated. An example of this is shown in figure 5.2.

The gradient ascent algorithm shown in figure 5.2 takes a step in the direction given by the gradient. The gradient operator will always point in the direction of the greatest increase. We’ve talked about direction, but I didn’t mention anything to do
with magnitude of movement. The magnitude, or step size, we’ll take is given by the parameter . In vector notation we can write the gradient ascent algorithm as

There are 100 data points in figure 5.3. Each point has two numeric features: X1 and X2. We’ll try to use gradient ascent to fit the best parameters for the logistic regression model to our data. We’ll do this by finding the best weights for this given dataset. Pseudocode for the gradient ascent would look like this:


Start with the weights all set to 1
Repeat R number of times:
Calculate the gradient of the entire dataset
Update the weights vector by alpha*gradient
Return the weights vector

# -*- coding: utf-8 -*-
"""
Created on Thu Dec 27 17:37:31 2018

@author: LlQ
"""

import numpy as np

def loadDataSet():
    dataMatrixList = [];
    labelMatrixList = [];
    
    fileRead = open('testSet.txt')
    
    #in testSet.txt file
    # X1         X2        Label
    #-0.017612    14.053064    0
    #-1.395634    4.662541    1
    #-0.752157    6.538620    0
    #-1.322371    7.152853    0
    #0.423363       11.054677    0
    #0.406704       7.067335    1
    #...
    
    for line in fileRead.readlines():
        lineArray = line.strip().split()
        dataMatrixList.append([1.0, float(lineArray[0]), float(lineArray[1])])
        labelMatrixList.append(int(lineArray[2]))
        
    return dataMatrixList, labelMatrixList

def sigmoid(intZ):
    
    return 1.0/(1+np.exp(-intZ))

#for R programming, glm will give your the coefficients value
                #dataMat--features:
                    #X0 X1          X2
                    #1 -0.017612    14.053064    
                    #1 -1.395634    4.662541    
                    #1 ...
                #classlabels---x: ([0, 1, ...])
                
def gradientAscent(dataMatrixList, classLabelList):
    #dataMatrix([[1, X1, X2],[1, X1,X2]...])
    dataMatrix=np.mat(dataMatrixList)# convert to NumPy matrices
    labelMatrix=np.mat(classLabelList).transpose()#100columns-to-100rows
    # LabelMatrix([
    # 0,
    # 1,
    # 0,
    # 0,
    # 0,
    # 1,
    # ...
    # ])
    rows, columns = np.shape(dataMatrix) # rows=100, columns=3
    
    alpha = 0.001
    maxCycles = 500
    weights = np.ones((columns,1))#the best coefficients w: n-rows and 1 column
    # ones since sigmoid(x1) or sigmoid(x2)
    #weight 3x1
    #weights([
    #   1,
    #   1,
    #   1
    #])
    #print(type(weights)) #ndarray########
    for k in range(maxCycles):
        #intZ = x * wT #we have two vectors of numbers and we’ll multiply 
        #each element and add them up to get one number.
        #x: feature
        #wT: coefficients
        #intx = x*bi
        #Z = x0 * w0 + x1 * w1 + x2 * w2 + ... + xn * wn
        intZ=dataMatrix*weights  #matrix
        #is not one multiplication but actually 100x3 * 3x1#matrices mulplication
        #print(len(intZ)) #100 rows
        # print(intZ)
        
        #print("**********************")
        sig = sigmoid(intZ)
        #print(len(sig))  #100
        #print(sig)     #0<sig<1
        error = (labelMatrix-sig)#matrix
        
        #weights + alpha * (3x100  * 100x1) #matrices mulplication
        weights = weights + alpha * dataMatrix.transpose()*error

    return weights #matrix 100x1 ########       
    #dataMatrix 3x100 * vertical List-3x1 matrix      

def plotBestFit(coefficientList):
    import matplotlib.pyplot as plt
    import numpy as np
    weights = coefficientList
    
    dataArrList, labelList = loadDataSet()
    dataArrList = np.array(dataArrList)
    columns = np.shape(dataArrList)[0]  #100 row

    xcord1 = []; ycord1 = []
    xcord2 = []; ycord2 = []
    for i in range(columns):
        if int(labelList[i]) == 1:
            xcord1.append(dataArrList[i,1]) #X1
            ycord1.append(dataArrList[i,2]) #X2
            
        else:
            xcord2.append(dataArrList[i,1]) #X1
            ycord2.append(dataArrList[i,2]) #x2
            
    fig=plt.figure()
    ax = fig.add_subplot(111)
    
    ax.scatter(xcord1, ycord1, s=30, c='red', marker='s')
    ax.scatter(xcord2, ycord2, s=30, c='green')
    
    x=np.arange(-3.0, 3.0, 0.1)
    y=(-weights[0]-weights[1]*x) / weights[2] #y=-(b0+b1x)/b3
    ax.plot(x,y)    #plot best-fit line
    
    plt.xlabel('X1')
    plt.ylabel('X2')
    plt.show()
    
###############################################################################
#   import logisticRegression
#   from imp import reload
#   reload(logisticRegression)  
    
#   dataArrList, labelList = logisticRegression.loadDataSet()
#   coefficient = logisticRegression.gradientAscent(dataArrList, labelList)
#   coefficient 
#   output:
#           matrix([[ 4.12414349],
#                   [ 0.48007329],
#                   [-0.6168482 ]])

#   logisticRegression.plotBestFit(coefficient.getA())    
############################################################################### 
    
def stochasticGradientAscent0(dataMatrixArray, classLabelList):
    rows, columns = np.shape(dataMatrixArray)
    alpha = 0.01
    weightList=np.ones(columns)
    
    for i in range(rows):
        sig = sigmoid(sum(dataMatrixArray[i]*weightList))  #float64
        error = classLabelList[i] - sig
        weightList=weightList + alpha * error * dataMatrixArray[i]
        
    #print(type(weightList)) #ndarray
    return weightList

###############################################################################
#   import logisticRegression
#   from imp import reload
#   reload(logisticRegression)
    
#   from numpy import *
    
#   dataArrList, labelList = logisticRegression.loadDataSet()
#   coefficientList = logisticRegression.stochasticGradientAscent0(array(dataArrList), labelList)    
###############################################################################  
    
def stochasticGradientAscent1(dataMatrixArray, classLabelList, numIter=150):

    rows, columns = np.shape(dataMatrixArray)
    #alpha=0.01
    weightList = np.ones(columns)
    
    for j in range(numIter):
        dataIndex = list(range(rows))
        
        for i in range(rows):
            #This will improve the oscillations that occur in the dataset
            #Alpha decreases as the number of iterations increases, 
            #but it never reaches 0 because there’s a constant term(0.0001). 
            
            #You need to do this so that after a large number of cycles, 
            #new data still has some impact. 
            
            #Perhaps you’re dealing with something that’s changing with time. 
            #Then you may want to let the constant term(>0.0001) be larger 
            #to give more weight to new values.
            
            #The second thing about the decreasing alpha function is that 
            #it decreases by 1/(j+i); j is the index of the number of times 
            #you go through the dataset,
            #and i is the index of the example in the training set. 
            #This gives an alpha that isn’t strictly decreasing when j<<max(i). 
            #The avoidance of a strictly decreasing weight is shown to
            #work in other optimization algorithms, such as simulated annealing.
            alpha = 4/(1.0+j+i)+0.0001 #alpha changes on each iteration.
            
            #you’re randomly selecting each instance to use in updating the 
            #weights. This will reduce the periodic variations
            randIndex = int(np.random.uniform(0,len(dataIndex)))
            
            sig = sigmoid(sum(dataMatrixArray[randIndex]*weightList))
            error = classLabelList[randIndex] - sig
            weightList = weightList + alpha*error*dataMatrixArray[randIndex]
            del(dataIndex[randIndex])
    
    return weightList
#Coefficient convergence in stocGradAscent1() with random vector selection and 
#decreasing alpha. This method is much faster to converge than using a fixed alpha

#dealing with missing values in the data
#Use the feature’s mean value from all the available data.
#■ Fill in the unknown with a special value like -1.
#■ Ignore the instance.
#■ Use a mean value from similar items.
#■ Use another machine learning algorithm to predict the value.

#replace all the unknown values with a real number(0) because we’re using NumPy
#in NumPy arrays can’t contain a missing value
#we want a value(0) that won’t impact the weight(coefficient) during the update
#The weights are updated according to
#                   weights = weights + alpha * error * dataMatrix[randIndex]    
#1) If dataMatrix is 0 for any feature, then the weight for that feature will 
#   simply be       weights = weights
#2) the error term will not be impacted by this because sigmoid(0)=0.5, 
#    which is totally neutral for predicting the class.
#3) none of the features take on 0 in the data, so in some sense 
#   it’s a special value. 
    
#Second, there was a missing class label in the test data. I simply threw it 
#out. It’s hard to replace a missing class label. This solution makes sense 
#given that we’re using logistic regression, but it may not make sense with 
#something like kNN(prediction).
    
###############################################################################
#   import logisticRegression
#   import numpy as np    
#   from imp import reload
#   reload(logisticRegression)
    
#   dataArrList, labelList = logisticRegression.loadDataSet()
#   coefficientList = logisticRegression.stochasticGradientAscent0(np.array(dataArrList), labelList)  

#    logisticRegression.plotBestFit(coefficientList)
############################################################################### 
 
#This takes the weights and an input vector and calculates the sigmoid. 
#If the value of the sigmoid is more than 0.5,
#it’s considered a 1; otherwise, it’s a 0  

def classifyVector(inXArray, weightList):
    prob = sigmoid(sum(inXArray*weightList))
    
    if prob > 0.5:
        return 1.0
    else:
        return 0.0
    
def colicTest():
    frTrain = open('horseColicTraining.txt')
    frTest = open('horseColicTest.txt')
    
    trainingSet =[]
    trainingLabels=[]
    
    for line in frTrain.readlines():
        currentLine = line.strip().split('\t')
        
        lineArray = []
        for i in range(21):
            lineArray.append(float(currentLine[i]))            
        trainingSet.append(lineArray)
        trainingLabels.append(float(currentLine[21]))
    
    #coefficients
    trainWeights = stochasticGradientAscent1(np.array(trainingSet), trainingLabels, 500)
    
    errorCount=0
    numTestVec = 0.0
    
    for line in frTest.readlines():
        numTestVec += 1.0
        
        currentLine=line.strip().split('\t')
        lineArray=[]
        for i in range(21):
            lineArray.append(float(currentLine[i]))

        if int(classifyVector(np.array(lineArray), trainWeights)) != \
                int(currentLine[21]):
            errorCount +=1
            
    errorRate = (float(errorCount)/numTestVec)
    print ("the error rate of this test is: %f" % errorRate)
    return errorRate

def multiTest():
    numTests = 10;
    errorSum = 0.0
    
    for k in range(numTests):
        errorSum += colicTest()
    print("after %d iterations the average error rate is: %f" % (numTests, \
                                                    errorSum/float(numTests)))
#This wasn’t bad with over 30% of the values missing. You can alter the number 
#of iterations in colicTest() and the alpha size(0.01 to 0.0001, the constant item)
# in stochGradAscent1() to get results approaching a 20% error rate.
    
###############################################################################
#   import logisticRegression
#   import numpy as np    
#   from imp import reload
#   reload(logisticRegression)

#   logisticRegression.multiTest()
#   output:
#       the error rate of this test is: 0.253731
#       the error rate of this test is: 0.373134
#       the error rate of this test is: 0.373134
#       the error rate of this test is: 0.402985
#       the error rate of this test is: 0.388060
#       the error rate of this test is: 0.417910
#       the error rate of this test is: 0.417910
#       the error rate of this test is: 0.283582
#       the error rate of this test is: 0.328358
#       the error rate of this test is: 0.358209
#       after 10 iterations the average error rate is: 0.359701   
    
#   #   logisticRegression.multiTest()
#   output:
#       the error rate of this test is: 0.358209
#       the error rate of this test is: 0.417910
#       the error rate of this test is: 0.358209
#       the error rate of this test is: 0.253731
#       the error rate of this test is: 0.298507
#       the error rate of this test is: 0.388060
#       the error rate of this test is: 0.373134
#       the error rate of this test is: 0.313433
#       the error rate of this test is: 0.313433
#       the error rate of this test is: 0.328358
#       after 10 iterations the average error rate is: 0.340299 
###############################################################################    

发布了53 篇原创文章 · 获赞 38 · 访问量 1万+

猜你喜欢

转载自blog.csdn.net/Linli522362242/article/details/93698989
今日推荐