Softmax and classification model study notes

softmax content and classification models:

The basic concept of the return of softmax
obtain Fashion-MNIST data set and read data
from scratch softmax regression model implementation, implementation model one pair of Fashion-MNIST training set of image data to classify
the use of regression models pytorch achieve softmax

The basic concept softmax regression
classification
a classification simple image, an input image and a high width are 2 pixels, color to grayscale.
4 pixels in the image are denoted as x1, x2, x3, x4.
Assuming that real tags for dogs, cats or chickens, these labels correspond to discrete values y1, y2, y3.
We typically use discrete values to represent categories such as y1 = 1, y2 = 2, y3 = 3.

Weight vector
Here Insert Picture Description
neural network of FIG
lower view of the neural network calculation above is depicted in FIG. softmax return, like linear regression, is a single-layer neural network. Since each output o1, o2, o3 is calculated to be dependent on all of the inputs x1, x2, x3, x4, softmax regression output layer is a layer fully connected.
Here Insert Picture Description
Since the classification issue needs to be discrete predicted output, a simple approach is to predict as the output value oi i category is confidence, and the maximum value of the output corresponding to the class as predicted output, ie the output argmaxioi. For example, if o1, o2, o3 respectively, 0.1, 10, 0.1, the maximum o2, then the predicted category 2, which represents the cat.

Output problem
directly using the output of the output layer has two problems:
on the one hand, due to the range of the output value of the output layer of uncertainty, we intuitively difficult to judge the significance of these values. For example, the output value in the example just cited 10 indicates "very confidence" category cat image, because the output value is 100 times the output value of the other two. But if o1 = o3 = 10 ^ 3, the output value of 10 but shows an image of a cat is very low probability category.
On the other hand, since the tag is a discrete real values, the error between the discrete values and the output values of the uncertainty range difficult to measure.
softmax operator (operator softmax) solve the above two problems. It is converted by the following formula is positive and an output value to 1 and a probability distribution:
Here Insert Picture Descriptionwherein:
Here Insert Picture Description
found belong to more than three (0, 1), and the sum of three terms 1, which constitutes a probability distribution. And, without changing the relative size of SoftMax, and then without changing the output.

Cross-entropy loss function

Compared to the previous strict mean square error loss function, we need a measure of the probability distribution function of the two measured differences. Among them, the cross-entropy (cross entropy) is a commonly used measure:
Here Insert Picture Description
cross-entropy concerned only predict the probability of correct categories, as long as it is large enough, you can ensure proper classification results. Of course, the face of a sample have several labels, for example, when the image contains more than one object, we do not do this step is simplified. But even in this case, the cross-entropy equally concerned only predict the probability of an object appearing in the image category.
Suppose the number of samples in the training data set is n, the cross-entropy loss function defined as:
Here Insert Picture Description
wherein represents a collection of function variable model parameters.

Training and forecasting model
after training well softmax regression model, given either as this feature, you can predict the probability of each output category. Usually, we predict the most probable category as the output category. If it is consistent with the real categories (tags), indicating that this prediction is correct. We will use accuracy (accuracy) to evaluate the performance of the model, which is equal to the number of correct predictions than the number of total forecast.

Fashion-MNIST acquired data set and the read data
prior to return to achieve introduction softmax introduce a multi-class classification the image data set, it will be used repeatedly in later sections, to facilitate the accuracy of the model we observed between the comparison algorithm and differences in the efficiency calculation. Image classification data set is most commonly used to identify handwritten digital data set MNIST [1]. But most models on MNIST classification accuracy of more than 95%. For examining difference between the algorithm is more visually observe, we will use a more complex image content data set Fashion-MNIST [2].
Here we use torchvision package, which is to serve the PyTorch deep learning framework, mainly used to build computer vision model. torchvision mainly consists of the following parts:

torchvision.datasets: Some data loading function and the set of common data interfaces;
torchvision.models: include commonly used model structure (including pre-trained model), e.g. AlexNet, VGG, ResNet the like;
torchvision.transforms: common image transformation, e.g. crop, rotate and so on;
torchvision.utils: some other useful methods.

Softmax regression model from scratch to achieve

import torch
import torchvision
import numpy as np
import sys
sys.path.append("/home/kesci/input")
import d2lzh1981 as d2l

print(torch.__version__)
print(torchvision.__version__)

# 获取训练集数据和测试集数据
batch_size = 256
train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size)

# 模型参数初始化
num_inputs = 784
print(28*28)
num_outputs = 10

W = torch.tensor(np.random.normal(0, 0.01, (num_inputs, num_outputs)), dtype=torch.float)
b = torch.zeros(num_outputs, dtype=torch.float)

W.requires_grad_(requires_grad=True)
b.requires_grad_(requires_grad=True)

# 对多维Tensor按维度操作
X = torch.tensor([[1, 2, 3], [4, 5, 6]])
print(X.sum(dim=0, keepdim=True))  # dim为0,按照相同的列求和,并在结果中保留列特征
print(X.sum(dim=1, keepdim=True))  # dim为1,按照相同的行求和,并在结果中保留行特征
print(X.sum(dim=0, keepdim=False)) # dim为0,按照相同的列求和,不在结果中保留列特征
print(X.sum(dim=1, keepdim=False)) # dim为1,按照相同的行求和,不在结果中保留行特征

# 定义softmax
def softmax(X):
    X_exp = X.exp()
    partition = X_exp.sum(dim=1, keepdim=True)
    # print("X size is ", X_exp.size())
    # print("partition size is ", partition, partition.size())
    return X_exp / partition  # 这里应用了广播机制

X = torch.rand((2, 5))
X_prob = softmax(X)
print(X_prob, '\n', X_prob.sum(dim=1))

# softmax回归模型
def net(X):
    return softmax(torch.mm(X.view((-1, num_inputs)), W) + b)

# 定义损失函数
y_hat = torch.tensor([[0.1, 0.3, 0.6], [0.3, 0.2, 0.5]])
y = torch.LongTensor([0, 2])
y_hat.gather(1, y.view(-1, 1))

def cross_entropy(y_hat, y):
    return - torch.log(y_hat.gather(1, y.view(-1, 1)))

# 定义准确率
def accuracy(y_hat, y):
    return (y_hat.argmax(dim=1) == y).float().mean().item()
print(accuracy(y_hat, y))

def evaluate_accuracy(data_iter, net):
    acc_sum, n = 0.0, 0
    for X, y in data_iter:
        acc_sum += (net(X).argmax(dim=1) == y).float().sum().item()
        n += y.shape[0]
    return acc_sum / n
print(evaluate_accuracy(test_iter, net))

# 训练模型
num_epochs, lr = 5, 0.1

# 本函数保存在d2lzh_pytorch包中方便以后使用
def train_ch3(net, train_iter, test_iter, loss, num_epochs, batch_size,
              params=None, lr=None, optimizer=None):
    for epoch in range(num_epochs):
        train_l_sum, train_acc_sum, n = 0.0, 0.0, 0
        for X, y in train_iter:
            y_hat = net(X)
            l = loss(y_hat, y).sum()
            
            # 梯度清零
            if optimizer is not None:
                optimizer.zero_grad()
            elif params is not None and params[0].grad is not None:
                for param in params:
                    param.grad.data.zero_()
            
            l.backward()
            if optimizer is None:
                d2l.sgd(params, lr, batch_size)
            else:
                optimizer.step() 
            
            
            train_l_sum += l.item()
            train_acc_sum += (y_hat.argmax(dim=1) == y).sum().item()
            n += y.shape[0]
        test_acc = evaluate_accuracy(test_iter, net)
        print('epoch %d, loss %.4f, train acc %.3f, test acc %.3f'
              % (epoch + 1, train_l_sum / n, train_acc_sum / n, test_acc))

train_ch3(net, train_iter, test_iter, cross_entropy, num_epochs, batch_size, [W, b], lr)

Model predictions

X, y = iter(test_iter).next()

true_labels = d2l.get_fashion_mnist_labels(y.numpy())
pred_labels = d2l.get_fashion_mnist_labels(net(X).argmax(dim=1).numpy())
titles = [true + '\n' + pred for true, pred in zip(true_labels, pred_labels)]

d2l.show_fashion_mnist(X[0:9], titles[0:9])

Use pytorch achieve softmax regression model

# 加载各种包或者模块
import torch
from torch import nn
from torch.nn import init
import numpy as np
import sys
sys.path.append("/home/kesci/input")
import d2lzh1981 as d2l

print(torch.__version__)

# 初始化参数和获取数据
batch_size = 256
train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size)

# 定义网络模型
num_inputs = 784
num_outputs = 10

class LinearNet(nn.Module):
    def __init__(self, num_inputs, num_outputs):
        super(LinearNet, self).__init__()
        self.linear = nn.Linear(num_inputs, num_outputs)
    def forward(self, x): # x 的形状: (batch, 1, 28, 28)
        y = self.linear(x.view(x.shape[0], -1))
        return y
    
# net = LinearNet(num_inputs, num_outputs)

class FlattenLayer(nn.Module):
    def __init__(self):
        super(FlattenLayer, self).__init__()
    def forward(self, x): # x 的形状: (batch, *, *, ...)
        return x.view(x.shape[0], -1)

from collections import OrderedDict
net = nn.Sequential(
        # FlattenLayer(),
        # LinearNet(num_inputs, num_outputs) 
        OrderedDict([
           ('flatten', FlattenLayer()),
           ('linear', nn.Linear(num_inputs, num_outputs))]) # 写成自定义的 LinearNet(num_inputs, num_outputs) 也可以
        )

# 初始化模型参数
init.normal_(net.linear.weight, mean=0, std=0.01)
init.constant_(net.linear.bias, val=0)

# 定义损失函数
loss = nn.CrossEntropyLoss() 
# 函数原型
# class torch.nn.CrossEntropyLoss(weight=None, size_average=None, ignore_index=-100, reduce=None, reduction='mean')

# 优化函数
optimizer = torch.optim.SGD(net.parameters(), lr=0.1) 
# 函数原型
# class torch.optim.SGD(params, lr=, momentum=0, dampening=0, weight_decay=0, nesterov=False)

# 训练
num_epochs = 5
d2l.train_ch3(net, train_iter, test_iter, loss, num_epochs, batch_size, None, None, optimizer)

Released two original articles · won praise 0 · Views 11

Guess you like

Origin blog.csdn.net/Micheal_Yuans/article/details/104310738