Author: chen_h
WeChat & QQ: 862251340
WeChat public account: coderpai
This tutorial is a translation of the neural network tutorial written by Peter Roelants . The author has authorized the translation. This is the original text .
This five-part tutorial covers how to get started with neural networks. You can find the full content at the link below.
- (1) Linear Regression for Getting Started with Neural Networks
- Logistic classification function
- (2) Logistic Regression (Classification Problem) for Getting Started with Neural Networks
- (3) Hidden Layer Design for Getting Started with Neural Networks
- Softmax classification function
- (4) Vectorization of Getting Started with Neural Networks
- (5) Building a multi-layer network for getting started with neural networks
softmax classification function
This part of the tutorial will cover two parts:
- softmax function
- Cross entropy loss function
In previous tutorials , we have learned how to use the Logistic function to implement a binary classification problem. For multi-classification problems, we can use multinomial logistic regression , which is also known as the softmax function . Next, let's explain what the softmax function is and how to get it.
Let's start by importing the packages that the tutorial needs to use.
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import colorConverter, ListedColormap
from mpl_toolkits.mplot3d import Axes3D
from matplotlib import cm
Softmax function
In the previous tutorial , we already know that the logistic function can only be used in binary classification problems, but its polynomial regression, the softmax function , can solve multi-classification problems. Assuming that ς
the input data of the softmax function is C
a vector of dimensions z
, then the data of the softmax function is also a C
vector of dimensions y
, and the value in it is between 0 and 1. The softmax function is actually a normalized exponential function, defined as follows:
The denominator in the formula acts as a regular term, which can make
As the output layer of the neural network, the value in the softmax function can be represented by C
a neuron.
For a given input z
, we can get the probability of each classification t = c for c = 1 ... C
can be expressed as:
where , denotes the probability P(t=c|z)
that, given the input z
, the input data is a classification.c
The figure below shows that in a binary classification (t = 1, t = 2)
, the input vector is z = [z1, z2]
, then the output probability is shown in the P(t=1|z)
figure below.
# Define the softmax function
def softmax(z):
return np.exp(z) / np.sum(np.exp(z))
# Plot the softmax output for 2 dimensions for both classes
# Plot the output in function of the weights
# Define a vector of weights for which we want to plot the ooutput
nb_of_zs = 200
zs = np.linspace(-10, 10, num=nb_of_zs) # input
zs_1, zs_2 = np.meshgrid(zs, zs) # generate grid
y = np.zeros((nb_of_zs, nb_of_zs, 2)) # initialize output
# Fill the output matrix for each combination of input z's
for i in range(nb_of_zs):
for j in range(nb_of_zs):
y[i,j,:] = softmax(np.asarray([zs_1[i,j], zs_2[i,j]]))
# Plot the cost function surfaces for both classes
fig = plt.figure()
# Plot the cost function surface for t=1
ax = fig.gca(projection='3d')
surf = ax.plot_surface(zs_1, zs_2, y[:,:,0], linewidth=0, cmap=cm.coolwarm)
ax.view_init(elev=30, azim=70)
cbar = fig.colorbar(surf)
ax.set_xlabel('$z_1$', fontsize=15)
ax.set_ylabel('$z_2$', fontsize=15)
ax.set_zlabel('$y_1$', fontsize=15)
ax.set_title ('$P(t=1|\mathbf{z})$')
cbar.ax.set_ylabel('$P(t=1|\mathbf{z})$', fontsize=15)
plt.grid()
plt.show()
Derivative of softmax function
In a neural network, to use the softmax function, we need to know the derivative of the softmax function. If we define:
Then you can get:
Therefore, the derivative of the output of the softmax function with y
respect to its input data can be defined as:z
∂yi/∂zj
Note that i = j
at the time , the reciprocal derivation of the softmax function was the same as the logistic function.
Cross-entropy loss function for softmax function
Before learning the loss function of the softmax function, we start by learning its maximum likelihood function. Given the parameter set of the model θ
, using this parameter set, we can get the correct prediction of the input sample, as in the Logistic loss function derivation, we can model the maximum likelihood estimate of this by writing:
According to the joint probability , we can rewrite the likelihood function as: P(t,z|θ)
, and according to the conditional distribution, we can finally get the following formula:
Since we don't care about z
the probability, the formula can be rewritten as: L(θ|t,z)=P(t|z,θ)
. Also, P(t|z, θ)
can be written as P(t|z)
if it θ
would be a constant value. Since, each ti
is dependent on the whole z
, and only one of them t
will be activated, we can get the following formula:
Just as we derive the derivative of the loss function in the logistic function, maximizing the likelihood function is minimizing its negative log relief function:
where ξ
is the cross-entropy error function. In a binary classification problem, we will t2
define as t2=1−t1
. Similarly, in the softmax function, we can also define it as:
In n
a batch of samples, the cross-entropy error function can be calculated as:
Among them, if and only if tic
yes 1
, then the sample i
belongs to the category c
, which yic
is the probability that the sample i
belongs to the category .c
Derivation of cross-entropy loss function for softmax function
zi
The derivative of the loss function with respect to is ∂ξ/∂zi
solved as follows:
The above equation has solved the two cases of when i=j
and .i≠j
The final result is ∂ξ/∂zi=yi−ti for all i ∈ C
that this derivation result is the same as the derivation of the cross-entropy loss function of the Logistic function, again proving that the softmax function is an expansion board of the Logistic function.