Multi-category problem practice based on jupyter

Exercise 3: Multiple Classification Problems


introduce

In this exercise, we will use logistic regression to recognize handwritten digits (0 to 9). We will extend the implementation of logistic regression in Exercise 2 and apply it to a one-vs-many classification problem.

Before starting the exercise, you need to download the following files for data upload :

  • ex3data1.mat - training set of handwritten digits
    insert image description here

Throughout the exercise, the following mandatory assignments are involved :

1 multi-category

In this part of the exercise, you will extend the logistic regression algorithm you implemented earlier to apply it to a multi-classification problem .

1.1 Dataset

The data in the file ex3data1.matcontains a training set of 5000 handwritten digits. Each sample is a gray-scale image of 20 pixels by 20 pixels, and each pixel is represented by a floating-point number, representing the gray-scale intensity of the position.
Expanding the 20x20 pixel grid into a 400-dimensional vector, each training sample becomes a row of vectors in the data matrix. As shown in the image below, the file gives us a 5000x400 matrix where each row is a sample of an image of a handwritten digit.

insert image description here

The second part of the training set is a 5000-dimensional vector yy containing the training set labelsy

1.2 Data Visualization

insert image description here

Images are represented in matrix X as 400-dimensional vectors (of which there are 5,000). The 400-dimensional "features" are the grayscale intensities of each pixel in the original 20 x 20 image. The class labels are in the vector y as the numeric classes representing the digits in the image.

Next, we need to load the data.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.io import loadmat

data = loadmat('/home/jovyan/work/ex3data1.mat')
data
{'__header__': b'MATLAB 5.0 MAT-file, Platform: GLNXA64, Created on: Sun Oct 16 13:09:09 2011',
 '__version__': '1.0',
 '__globals__': [],
 'X': array([[0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        ...,
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.]]),
 'y': array([[10],
        [10],
        [10],
        ...,
        [ 9],
        [ 9],
        [ 9]], dtype=uint8)}

and check the data matrix XX using the shape built-in functionx ,yyThe shape of y :

data['X'].shape, data['y'].shape
((5000, 400), (5000, 1))

1.3 Vectorization of Logistic Regression

In this part of the exercise, you need to modify the logistic regression implementation to be fully vectorized (i.e. without the alternative for forf or loop). This is because vectorized code, in addition to being concise, is able to take advantage of linear algebra optimizations and is often much faster than iterative code. However, if we saw from exercise 2 that our cost function has a fully vectorized implementation, so we can reuse the same implementation here.

1.3.1 Vectorization of cost function

You need to write code to vectorize the cost function . We already know that the cost function is:

J ( θ ) = 1 m ∑ i = 1 m [ − y ( i ) log ⁡ ( h θ ( x ( i ) ) ) − ( 1 − y ( i ) ) log ⁡ ( 1 − h θ ( x ( i ) ) ) ] J\left( \theta \right)=\frac{1}{m}\sum\limits_{i=1}^{m}{[-{ {y}^{(i)}}\log \left( { {h}_{\theta }}\left( { {x}^{(i)}} \right) \right)-\left( 1-{ {y}^{(i)}} \right)\log \left( 1-{ {h}_{\theta }}\left( { {x}^{(i)}} \right) \right)]} J( i )=m1i=1m[y(i)log(hi(x(i)))(1y(i))log(1hi(x( i ) ))]
To compute each element in the sum, we need to compute for each sampleiii
h θ ( x ( i ) ) { {h}_{\theta }}\left( { {x}^{(i)}} \right) hi(x(i))。其中,
h θ ( x ) = g ( θ T X ) { {h}_{\theta }}\left( x \right)=g\left({ { {\theta }^{T}}X} \right)\\ hi(x)=g( iTX)
And the sigmoid function is:
g ( z ) = 1 1 + e − zg\left( z \right)=\frac{1}{1+{ {e}^{-z}}}\\g(z)=1+ez1
It turns out that, for all examples, we can compute quickly with matrix multiplication. We define XXXθ \thetaθ is:
insert image description here

Then, perform matrix multiplication X θ X\theta , calculated to get:
insert image description here

In the last equation, if aaa andbbb are both vectors, we can usea T b = b T aa^Tb=b^TaaTb=bThe fact that T aθ T x ( i ) \theta^Tx^{(i)}
can be computed in one line of codeiTx(i)

###在这里填入代码###
def sigmoid(z):
    return 1 / (1 + np.exp(-z))

def cost(theta, X, y):
    theta = np.matrix(theta)
    X = np.matrix(X)
    y = np.matrix(y)
    first = np.multiply(-y, np.log(sigmoid(X * theta.T)))
    second = np.multiply((1 - y), np.log(1 - sigmoid(X * theta.T))) 
    return np.sum(first - second) / len(X)

1.3.2 Gradient Vectorization

We know that this cost function should be minimized using gradient descent. To recap, the gradient of the logistic regression cost function is a vector whose jjthj个元素定义为
∂ J ∂ θ j = 1 m ∑ i = 1 m [ h θ ( x ( i ) ) − y ( i ) ] x j ( i ) \frac{\partial J}{\partial \theta_j} = \frac{1}{m}\sum\limits_{i=1}^{m}{[{ {h}_{\theta }}\left( { {x}^{(i)}} \right)-{ {y}^{(i)}}]x_{_{j}}^{(i)}} θjJ=m1i=1m[hi(x(i))y(i)]xj(i)
To vectorize this operation, we need to take all θ j \theta_jijThe partial derivatives of are written out:

insert image description here
in:
insert image description here

Note x ( i ) x^{(i)}x( i ) is a one-way quantity, and( h θ ( x ( i ) ) − y ( i ) ) (h_\theta(x^{(i)})-y^{(i)})(hi(x(i))y( i ) )is an index (number).
β i = ( h θ ( x ( i ) ) − y ( i ) ) \beta_i = ({ {h}_{\theta }}\left( { {x}^{(i)}} \right)-{ { y}^{(i)}})bi=(hi(x(i))y( i ) )can be understood as follows:
insert image description here

After vectorizing the operation, we know that the partial derivative calculation can be performed without using the LOOP cycle. Next you need to write code to implement a vectorized version of the above code .

###在这里填入代码###
def gradient(theta, X, y):
    theta = np.matrix(theta)
    X = np.matrix(X)
    y = np.matrix(y)
    
    parameters = int(theta.ravel().shape[1])
    error = sigmoid(X * theta.T) - y
    
    grad = ((X.T * error) / len(X)).T
    
    return grad

1.3.3 Vectorization of regularized logistic regression

In Exercise 2, we implement the cost function and gradient computation function of the regularized logistic regression algorithm. Its cost function is:

J ( θ ) = 1 m ∑ i = 1 m [ − y ( i ) log ⁡ ( h θ ( x ( i ) ) ) − ( 1 − y ( i ) ) log ⁡ ( 1 − h θ ( x ( i ) ) ) ] + λ 2 m ∑ j = 1 n θ j 2 J\left( \theta \right)=\frac{1}{m}\sum\limits_{i=1}^{m}{[-{ {y}^{(i)}}\log \left( { {h}_{\theta }}\left( { {x}^{(i)}} \right) \right)-\left( 1-{ {y}^{(i)}} \right)\log \left( 1-{ {h}_{\theta }}\left( { {x}^{(i)}} \right) \right)]}+\frac{\lambda }{2m}\sum\limits_{j=1}^{n}{\theta _{j}^{2}} J( i )=m1i=1m[y(i)log(hi(x(i)))(1y(i))log(1hi(x(i)))]+2 mlj=1nij2
Note that there is no need for θ o \theta_oio进行正则化,其用于偏差的计算。
对应地,其梯度的计算公式如下:
R e p e a t   u n t i l   c o n v e r g e n c e    ⁣ ⁣ {  ⁣ ⁣     θ 0 : = θ 0 − a 1 m ∑ i = 1 m [ h θ ( x ( i ) ) − y ( i ) ] x 0 ( i )   θ j : = θ j − a 1 m ∑ i = 1 m [ h θ ( x ( i ) ) − y ( i ) ] x j ( i ) + λ m θ j    ⁣ ⁣ }  ⁣ ⁣   R e p e a t \begin{align} & Repeat\text{ }until\text{ }convergence\text{ }\!\!\{\!\!\text{ } \\ & \text{ }{ {\theta }_{0}}:={ {\theta }_{0}}-a\frac{1}{m}\sum\limits_{i=1}^{m}{[{ {h}_{\theta }}\left( { {x}^{(i)}} \right)-{ {y}^{(i)}}]x_{_{0}}^{(i)}} \\ & \text{ }{ {\theta }_{j}}:={ {\theta }_{j}}-a\frac{1}{m}\sum\limits_{i=1}^{m}{[{ {h}_{\theta }}\left( { {x}^{(i)}} \right)-{ {y}^{(i)}}]x_{j}^{(i)}}+\frac{\lambda }{m}{ {\theta }_{j}} \\ & \text{ }\!\!\}\!\!\text{ } \\ & Repeat \\ \end{align} Repeat until convergence {   i0:=i0am1i=1m[hi(x(i))y(i)]x0(i) ij:=ijam1i=1m[hi(x(i))y(i)]xj(i)+mlij } Repeat
Next, you need to write code that implements a vectorized implementation of the regularized logistic regression algorithm's cost function and gradient .

###在这里填入代码###
def costReg(theta, X, y, learningRate):
    theta = np.matrix(theta)
    X = np.matrix(X)
    y = np.matrix(y)
    first = np.multiply(-y, np.log(sigmoid(X * theta.T)))
    second = np.multiply((1 - y), np.log(1 - sigmoid(X * theta.T)))
    reg = (learningRate / (2 * len(X))) * np.sum(np.power(theta[:,1:theta.shape[1]], 2))
    return np.sum(first - second) / len(X) + reg

def gradientReg(theta, X, y, learningRate):
    theta = np.matrix(theta)
    X = np.matrix(X)
    y = np.matrix(y)
    
    parameters = int(theta.ravel().shape[1])
    error = sigmoid(X * theta.T) - y
    
    grad = ((X.T * error) / len(X)).T + ((learningRate / len(X)) * theta)
    
    # intercept gradient is not regularized
    grad[0, 0] = np.sum(np.multiply(error, X[:,0])) / len(X)
    
    return np.array(grad).ravel()

1.4 Multi-Classification - Classifier

Now that we have defined the cost and gradient functions, we need to build a classifier. For handwriting recognition, we have 10 possible classes (0-9), but logistic regression is a binary classification problem.

In this exercise, your task is to implement a one-to-one full classification method with kkK labels of different classes havekkk classifiers, each classifier in "categoryiii " and "notiii " to decide between. We'll wrap the classifier training in a function that computes the final weights for each of the 10 classifiers and returns the weights as [ k , n + 1 ][k,n+1][k,n+1 ] , wherennn is the number of parameters.

Note that :

  • Need to add θ 0 \theta_0i0to calculate the intercept term.
  • Convert $y$ from class labels to binary for each classifier (either class i or not class i).
  • Use the minimize function of the optimization class of the scipy library to minimize the cost function of each classifier.
  • Assign the found optimal parameters to the parameter array, and return the shape as [ k , n + 1 ] [k,n+1][k,n+1 ] the parameter array.

Among them, the most important part of implementing vectorized code is to ensure that all matrices are written correctly and their dimensions are correct.

###在这里填入代码###
from scipy.optimize import minimize

def one_vs_all(X, y, num_labels, learning_rate):
    rows = X.shape[0]
    params = X.shape[1]
    
    # k个分类器的参数,形状为(k,n+1)
    all_theta = np.zeros((num_labels, params + 1))
    
    # 插入值为1的列,用于计算截距项
    X = np.insert(X, 0, values=np.ones(rows), axis=1)
    
    # 将分类标签转换为0-1标识
    for i in range(1, num_labels + 1):
        theta = np.zeros(params + 1)
        y_i = np.array([1 if label == i else 0 for label in y])
        y_i = np.reshape(y_i, (rows, 1))
        
        # 使用minimize函数最小化代价函数
        fmin = minimize(fun=costReg, x0=theta, args=(X, y_i, learning_rate), method='TNC', jac=gradientReg)
        all_theta[i-1,:] = fmin.x
    
    return all_theta

Let's check the variables that need to be initialized, and the shape of the variables:

rows = data['X'].shape[0]
params = data['X'].shape[1]

all_theta = np.zeros((10, params + 1))

X = np.insert(data['X'], 0, values=np.ones(rows), axis=1)

theta = np.zeros(params + 1)

y_0 = np.array([1 if label == 0 else 0 for label in data['y']])
y_0 = np.reshape(y_0, (rows, 1))

X.shape, y_0.shape, theta.shape, all_theta.shape
((5000, 401), (5000, 1), (401,), (10, 401))

Among them, theta thetat h e t a is a 1D array, so when it is converted to a matrix in the compute gradient code, it becomes of shape( 1 , 401 ) (1,401)(1,401 ) matrix. At the same time, we need to checkyyThe class labels in y .

np.unique(data['y'])#看下有几类标签
array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10], dtype=uint8)

Next, to make sure the training function works correctly, run the following code to see if you get reasonable output.

###请运行并测试你的代码###
all_theta = one_vs_all(data['X'], data['y'], 10, 1)
all_theta
array([[-2.38373823e+00,  0.00000000e+00,  0.00000000e+00, ...,
         1.30440684e-03, -7.49607957e-10,  0.00000000e+00],
       [-3.18277385e+00,  0.00000000e+00,  0.00000000e+00, ...,
         4.46416745e-03, -5.08967467e-04,  0.00000000e+00],
       [-4.79656036e+00,  0.00000000e+00,  0.00000000e+00, ...,
        -2.87471064e-05, -2.47976297e-07,  0.00000000e+00],
       ...,
       [-7.98398219e+00,  0.00000000e+00,  0.00000000e+00, ...,
        -8.95642491e-05,  7.22603652e-06,  0.00000000e+00],
       [-4.57124969e+00,  0.00000000e+00,  0.00000000e+00, ...,
        -1.33504169e-03,  9.98035730e-05,  0.00000000e+00],
       [-5.40535662e+00,  0.00000000e+00,  0.00000000e+00, ...,
        -1.16457336e-04,  7.86968213e-06,  0.00000000e+00]])

1.5 Prediction using classifiers

We are now ready for the final step where you need to use the trained classifier to predict a label for each image .

For this step, we will calculate the class probability for each class, for each training sample (using the vectorized code), and label the output class as the class with the highest probability.

###在这里填入代码###
def predict_all(X, all_theta):
    rows = X.shape[0]
    params = X.shape[1]
    num_labels = all_theta.shape[0]
    
    # 与之前一样,需要插入一列确保矩阵形状
    X = np.insert(X, 0, values=np.ones(rows), axis=1)
    
    # 将其转换为矩阵
    X = np.matrix(X)
    all_theta = np.matrix(all_theta)
    
    # 计算每个训练样本所属每个类别的概率
    h = sigmoid(X * all_theta.T)
    
    # 创建具有最大概率的索引数组
    h_argmax = np.argmax(h, axis=1)
    
    # 因为我们的数组是零索引的,所以我们需要为真正的标签预测+1
    h_argmax = h_argmax + 1
    
    return h_argmax

Now we can use the predict_all function to generate class predictions for each instance and see how our classifier works.

###请运行并测试你的代码###
y_pred = predict_all(data['X'], all_theta)
correct = [1 if a == b else 0 for (a, b) in zip(y_pred, data['y'])]
accuracy = (sum(map(int, correct)) / float(len(correct)))
print ('accuracy = {0}%'.format(accuracy * 100))
accuracy = 94.46%

Summarize

The multi-classification problem is an important task in machine learning, which usually requires the use of classification algorithms to predict and classify different categories of data. In practice, we can use a variety of different multi-classification algorithms such as decision trees, neural networks, support vector machines, etc. to solve practical problems. At the same time, it is also necessary to pay attention to data preprocessing, feature selection, and model evaluation to improve the accuracy and reliability of the algorithm. Finally, in order to better apply multi-classification algorithms, it is necessary to continuously learn and explore new algorithms and techniques to cope with changing data and needs.

Guess you like

Origin blog.csdn.net/weixin_53573350/article/details/131071183