python 实现 softmax分类器(MNIST数据集)

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/wds2006sdo/article/details/53699778

最近一直在外面,李航那本书没带在身上,所以那本书的算法实现估计要拖后了。
这几天在看Andrew Ng 机器学习的课程视频,正好看到了Softmax分类器那块,发现自己之前理解perceptron与logistic regression是有问题的。这两个算法真正核心的不同在于其分类函数的不同,perceptron采用一个分段函数作为分类器,logistic regression采用sigmod函数作为分类器,这才是这两个函数真正的不同。

废话不多说了,今天打算实现softmax分类器。

算法

算法参考的是Andrew 的课件与这篇文章
具体实现的时候发现加入权重衰减效果会更好。

这里为了防止大家看不懂我的程序,我在这里做一些定义

ΘjJ(Θ)=x(i)(1{y(i)=j}p(y(i)=j|x(i);Θ))+λΘj(1)

p(y(i)=j|x(i);Θ)=eΘTjx(i)kl=1eΘTlx(i)(2)

eΘTlx(i)(3)

数据集

数据集和KNN那个博文用的是同样的数据集。
数据地址:https://github.com/WenDesi/lihang_book_algorithm/blob/master/data/train.csv

特征

将整个图作为特征

代码

代码已上传GitHub

这次的代码是python3的,有可能需要稍微改一改,不好意思了,我要背叛python2了。

# encoding=utf8

import math
import pandas as pd
import numpy as np
import random
import time

from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score


class Softmax(object):

    def __init__(self):
        self.learning_step = 0.000001           # 学习速率
        self.max_iteration = 100000             # 最大迭代次数
        self.weight_lambda = 0.01               # 衰退权重

    def cal_e(self,x,l):
        '''
        计算博客中的公式3
        '''

        theta_l = self.w[l]
        product = np.dot(theta_l,x)

        return math.exp(product)

    def cal_probability(self,x,j):
        '''
        计算博客中的公式2
        '''

        molecule = self.cal_e(x,j)
        denominator = sum([self.cal_e(x,i) for i in range(self.k)])

        return molecule/denominator


    def cal_partial_derivative(self,x,y,j):
        '''
        计算博客中的公式1
        '''

        first = int(y==j)                           # 计算示性函数
        second = self.cal_probability(x,j)          # 计算后面那个概率

        return -x*(first-second) + self.weight_lambda*self.w[j]

    def predict_(self, x):
        result = np.dot(self.w,x)
        row, column = result.shape

        # 找最大值所在的列
        _positon = np.argmax(result)
        m, n = divmod(_positon, column)

        return m

    def train(self, features, labels):
        self.k = len(set(labels))

        self.w = np.zeros((self.k,len(features[0])+1))
        time = 0

        while time < self.max_iteration:
            print('loop %d' % time)
            time += 1
            index = random.randint(0, len(labels) - 1)

            x = features[index]
            y = labels[index]

            x = list(x)
            x.append(1.0)
            x = np.array(x)

            derivatives = [self.cal_partial_derivative(x,y,j) for j in range(self.k)]

            for j in range(self.k):
                self.w[j] -= self.learning_step * derivatives[j]

    def predict(self,features):
        labels = []
        for feature in features:
            x = list(feature)
            x.append(1)

            x = np.matrix(x)
            x = np.transpose(x)

            labels.append(self.predict_(x))
        return labels


if __name__ == '__main__':

    print('Start read data')

    time_1 = time.time()

    raw_data = pd.read_csv('../data/train.csv', header=0)
    data = raw_data.values

    imgs = data[0::, 1::]
    labels = data[::, 0]

    # 选取 2/3 数据作为训练集, 1/3 数据作为测试集
    train_features, test_features, train_labels, test_labels = train_test_split(
        imgs, labels, test_size=0.33, random_state=23323)
    # print train_features.shape
    # print train_features.shape

    time_2 = time.time()
    print('read data cost '+ str(time_2 - time_1)+' second')

    print('Start training')
    p = Softmax()
    p.train(train_features, train_labels)

    time_3 = time.time()
    print('training cost '+ str(time_3 - time_2)+' second')

    print('Start predicting')
    test_predict = p.predict(test_features)
    time_4 = time.time()
    print('predicting cost ' + str(time_4 - time_3) +' second')

    score = accuracy_score(test_labels, test_predict)
    print("The accruacy socre is " + str(score))

运行结果

这里写图片描述

速度挺快,正确率一般吧,比决策树之类的要高。

猜你喜欢

转载自blog.csdn.net/wds2006sdo/article/details/53699778