Author: chen_h
WeChat & QQ: 862251340
WeChat public account: coderpai
This tutorial is a translation of the neural network tutorial written by Peter Roelants . The author has authorized the translation. This is the original text .
This five-part tutorial covers how to get started with neural networks. You can find the full content at the link below.
- (1) Linear Regression for Getting Started with Neural Networks
- Logistic classification function
- (2) Logistic Regression (Classification Problem) for Getting Started with Neural Networks
- (3) Hidden Layer Design for Getting Started with Neural Networks
- Softmax classification function
- (4) Vectorization of Getting Started with Neural Networks
- (5) Building a multi-layer network for getting started with neural networks
Logistic classification function
This part of the tutorial will introduce two parts:
* Logistic function
* Cross entropy loss function
If we use neural networks for classification, for binary classification problems, t=1
alternatively t=0
, we can use functions in logistic
regression . For multi-classification problems, we use functions to handle polynomial regression . In this tutorial, we first explain the knowledge about functions, and the subsequent tutorials will introduce the knowledge of functions.logistic
softmax
logistic
logistic
softmax
Let's start by importing the packages that the tutorial needs to use.
from __future__ import print_function
import numpy as np
import matplotlib.pyplot as plt
Logistic function
Suppose our goal is z
to predict the classification based on the input t
. A probability equation P(t=1|z)
expresses the value of the output y
according to the logisitc
functiony=σ(z)
. σ
is defined as:
According to the probability of the function classification t=1
or t=0
, we can get the following formula:
Note that it is actually the logarithm of the ratioz
of P(t=1|z) to P(t=0|z) .
logistic
The function is implemented in the code below logistic(z)
and the function is visualized logistic
.
# Define the logistic function
def logistic(z):
return 1 / (1 + np.exp(-z))
# Plot the logistic function
z = np.linspace(-6,6,100)
plt.plot(z, logistic(z), 'b-')
plt.xlabel('$z$', fontsize=15)
plt.ylabel('$\sigma(z)$', fontsize=15)
plt.title('logistic function')
plt.grid()
plt.show()
Logistic function derivation
Because neural networks are generally optimized using gradient descent, we need to first find y
the z
inverse of , which ∂y/∂z
can be expressed as:
Because 1−σ(z))=1−1/(1+e^−z)=e−z/(1+e^−z)
, so we can simplify the above formula to:
logistic_derivative(z)
A function implements the Logistic
derivation of a function.
# Define the logistic function
def logistic_derivative(z):
return logistic(z) * (1 - logistic(z))
# Plot the derivative of the logistic function
z = np.linspace(-6,6,100)
plt.plot(z, logistic_derivative(z), 'r-')
plt.xlabel('$z$', fontsize=15)
plt.ylabel('$\\frac{\\partial \\sigma(z)}{\\partial z}$', fontsize=15)
plt.title('derivative of the logistic function')
plt.grid()
plt.show()
Cross-entropy loss function for logistic function
The output of the model y=σ(z)
can be expressed as a probability y
, if t=1
, or a probability 1-y
, if t=0
. We record this as P(t=1|z)=σ(z)=y
.
In a neural network, for a given set of parameters θ
, we can use maximum likelihood estimation to optimize the parameters. Parameters θ
transform the input samples into parameters that are input into the Logistic
function z
, ie z = θ * x
. The maximum likelihood estimate can be written as:
Because for the given parameters θ
, to generate the t
sum z
, according to the joint probabilityL(θ|t,z)
we can rewrite the likelihood function as P(t,z|θ)
. Since P(A,B) = P(A|B) ∗ P(B)
, we can again simplify the joint probability:
Since we don't care about z
the probabilities involved, we can rewrite the original likelihood function as:
Because it t
obeys Bernoulli distribution , and if a parameter is given θ
, it P(t|z)=y
is a definite value, so we can rewrite the probability equation:
Since the logarithmic function is a monotonically increasing function, we can optimize the log-likelihood function accordingly
The maximum value of this function is the same as the maximum value of the regular likelihood function, so we calculate the log-likelihood function as follows,
We minimize this negative log-likelihood function, which is equivalent to maximizing the likelihood function. A typical error function can be designed as the following cross-entropy error function :
This function may look complicated, but it's simpler if we break it down.
From the above formula, we can find that if the sample is correctly classified, then the loss function L(t,y)
and the negative log probability function are the same in the expression, that is
Since we can t
only take a value 0
or 1
, we can L(t, y)
write it as:
If you want to analyze each training data, then it is the following formula:
Another reason we use the cross-entropy function is that in simple Logistic
regression, the cross-entropy function is a convex loss function and the global minimum is easy to find.
Derivation of the cross-entropy loss function for the logistic function
∂ξ/∂y
For the derivation of the loss function , the calculation is as follows:
z
Now, it will be easy for us to differentiate the input parameters .
At this point, the complete derivation is completed.