Softmax classification function

Author: chen_h
WeChat & QQ: 862251340
WeChat public account: coderpai


This tutorial is a translation of the neural network tutorial written by Peter Roelants . The author has authorized the translation. This is the original text .

This five-part tutorial covers how to get started with neural networks. You can find the full content at the link below.

softmax classification function


This part of the tutorial will cover two parts:

  • softmax function
  • Cross entropy loss function

In previous tutorials , we have learned how to use the Logistic function to implement a binary classification problem. For multi-classification problems, we can use multinomial logistic regression , which is also known as the softmax function . Next, let's explain what the softmax function is and how to get it.

Let's start by importing the packages that the tutorial needs to use.

import numpy as np 
import matplotlib.pyplot as plt  
from matplotlib.colors import colorConverter, ListedColormap 
from mpl_toolkits.mplot3d import Axes3D  
from matplotlib import cm 

Softmax function

In the previous tutorial , we already know that the logistic function can only be used in binary classification problems, but its polynomial regression, the softmax function , can solve multi-classification problems. Assuming that ςthe input data of the softmax function is Ca vector of dimensions z, then the data of the softmax function is also a Cvector of dimensions y, and the value in it is between 0 and 1. The softmax function is actually a normalized exponential function, defined as follows:

softmax function

The denominator in the formula acts as a regular term, which can make

As the output layer of the neural network, the value in the softmax function can be represented by Ca neuron.

For a given input z, we can get the probability of each classification t = c for c = 1 ... Ccan be expressed as:

probability equation

where , denotes the probability P(t=c|z)that, given the input z, the input data is a classification.c

The figure below shows that in a binary classification (t = 1, t = 2), the input vector is z = [z1, z2], then the output probability is shown in the P(t=1|z)figure below.

# Define the softmax function
def softmax(z):
    return np.exp(z) / np.sum(np.exp(z))
# Plot the softmax output for 2 dimensions for both classes
# Plot the output in function of the weights
# Define a vector of weights for which we want to plot the ooutput
nb_of_zs = 200
zs = np.linspace(-10, 10, num=nb_of_zs) # input 
zs_1, zs_2 = np.meshgrid(zs, zs) # generate grid
y = np.zeros((nb_of_zs, nb_of_zs, 2)) # initialize output
# Fill the output matrix for each combination of input z's
for i in range(nb_of_zs):
    for j in range(nb_of_zs):
        y[i,j,:] = softmax(np.asarray([zs_1[i,j], zs_2[i,j]]))
# Plot the cost function surfaces for both classes
fig = plt.figure()
# Plot the cost function surface for t=1
ax = fig.gca(projection='3d')
surf = ax.plot_surface(zs_1, zs_2, y[:,:,0], linewidth=0, cmap=cm.coolwarm)
ax.view_init(elev=30, azim=70)
cbar = fig.colorbar(surf)
ax.set_xlabel('$z_1$', fontsize=15)
ax.set_ylabel('$z_2$', fontsize=15)
ax.set_zlabel('$y_1$', fontsize=15)
ax.set_title ('$P(t=1|\mathbf{z})$')
cbar.ax.set_ylabel('$P(t=1|\mathbf{z})$', fontsize=15)
plt.grid()
plt.show()

The probability of P(t=1|z)

Derivative of softmax function

In a neural network, to use the softmax function, we need to know the derivative of the softmax function. If we define:

Then you can get:

Therefore, the derivative of the output of the softmax function with yrespect to its input data can be defined as:z∂yi/∂zj

Derivative derivation

Note that i = jat the time , the reciprocal derivation of the softmax function was the same as the logistic function.

Cross-entropy loss function for softmax function

Before learning the loss function of the softmax function, we start by learning its maximum likelihood function. Given the parameter set of the model θ, using this parameter set, we can get the correct prediction of the input sample, as in the Logistic loss function derivation, we can model the maximum likelihood estimate of this by writing:

maximum likelihood estimation

According to the joint probability , we can rewrite the likelihood function as: P(t,z|θ), and according to the conditional distribution, we can finally get the following formula:

Conditional distribution

Since we don't care about zthe probability, the formula can be rewritten as: L(θ|t,z)=P(t|z,θ). Also, P(t|z, θ)can be written as P(t|z)if it θwould be a constant value. Since, each tiis dependent on the whole z, and only one of them twill be activated, we can get the following formula:

probability derivation

Just as we derive the derivative of the loss function in the logistic function, maximizing the likelihood function is minimizing its negative log relief function:

Negative log-likelihood function

where ξis the cross-entropy error function. In a binary classification problem, we will t2define as t2=1−t1. Similarly, in the softmax function, we can also define it as:

Cross entropy error function

In na batch of samples, the cross-entropy error function can be calculated as:

batch processing

Among them, if and only if ticyes 1, then the sample ibelongs to the category c, which yicis the probability that the sample ibelongs to the category .c

Derivation of cross-entropy loss function for softmax function

ziThe derivative of the loss function with respect to is ∂ξ/∂zisolved as follows:

derivation process

The above equation has solved the two cases of when i=jand .i≠j

The final result is ∂ξ/∂zi=yi−ti for all i ∈ Cthat this derivation result is the same as the derivation of the cross-entropy loss function of the Logistic function, again proving that the softmax function is an expansion board of the Logistic function.

Full code, click here

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325546042&siteId=291194637