Logistic classification function

Author: chen_h
WeChat & QQ: 862251340
WeChat public account: coderpai


This tutorial is a translation of the neural network tutorial written by Peter Roelants . The author has authorized the translation. This is the original text .

This five-part tutorial covers how to get started with neural networks. You can find the full content at the link below.

Logistic classification function


This part of the tutorial will introduce two parts:
* Logistic function
* Cross entropy loss function

If we use neural networks for classification, for binary classification problems, t=1alternatively t=0, we can use functions in logisticregression . For multi-classification problems, we use functions to handle polynomial regression . In this tutorial, we first explain the knowledge about functions, and the subsequent tutorials will introduce the knowledge of functions.logisticsoftmaxlogisticlogisticsoftmax

Let's start by importing the packages that the tutorial needs to use.

from __future__ import print_function

import numpy as np
import matplotlib.pyplot as plt
Logistic function

Suppose our goal is zto predict the classification based on the input t. A probability equation P(t=1|z)expresses the value of the output yaccording to the logisitcfunctiony=σ(z) . σis defined as:

logistic function

According to the probability of the function classification t=1or t=0, we can get the following formula:

probability equation

Note that it is actually the logarithm of the ratioz of P(t=1|z) to P(t=0|z) .

odds ratio

logisticThe function is implemented in the code below logistic(z)and the function is visualized logistic.

# Define the logistic function
def logistic(z):
  return 1 / (1 + np.exp(-z))
# Plot the logistic function
z = np.linspace(-6,6,100)
plt.plot(z, logistic(z), 'b-')
plt.xlabel('$z$', fontsize=15)
plt.ylabel('$\sigma(z)$', fontsize=15)
plt.title('logistic function')
plt.grid()
plt.show()

logistic function

Logistic function derivation

Because neural networks are generally optimized using gradient descent, we need to first find ythe zinverse of , which ∂y/∂zcan be expressed as:

Gradient derivation

Because 1−σ(z))=1−1/(1+e^−z)=e−z/(1+e^−z), so we can simplify the above formula to:

Gradient derivation

logistic_derivative(z)A function implements the Logisticderivation of a function.

# Define the logistic function
def logistic_derivative(z):
  return logistic(z) * (1 - logistic(z))
# Plot the derivative of the logistic function
z = np.linspace(-6,6,100)
plt.plot(z, logistic_derivative(z), 'r-')
plt.xlabel('$z$', fontsize=15)
plt.ylabel('$\\frac{\\partial \\sigma(z)}{\\partial z}$', fontsize=15)
plt.title('derivative of the logistic function')
plt.grid()
plt.show()

Gradient of logistic function

Cross-entropy loss function for logistic function

The output of the model y=σ(z)can be expressed as a probability y, if t=1, or a probability 1-y, if t=0. We record this as P(t=1|z)=σ(z)=y.

In a neural network, for a given set of parameters θ, we can use maximum likelihood estimation to optimize the parameters. Parameters θtransform the input samples into parameters that are input into the Logisticfunction z, ie z = θ * x. The maximum likelihood estimate can be written as:

maximum likelihood estimation

Because for the given parameters θ, to generate the tsum z, according to the joint probabilityL(θ|t,z) we can rewrite the likelihood function as P(t,z|θ). Since P(A,B) = P(A|B) ∗ P(B), we can again simplify the joint probability:

joint probability

Since we don't care about zthe probabilities involved, we can rewrite the original likelihood function as:

Likelihood function

Because it tobeys Bernoulli distribution , and if a parameter is given θ, it P(t|z)=yis a definite value, so we can rewrite the probability equation:

probability equation

Since the logarithmic function is a monotonically increasing function, we can optimize the log-likelihood function accordingly

Likelihood function

The maximum value of this function is the same as the maximum value of the regular likelihood function, so we calculate the log-likelihood function as follows,

log-likelihood function

We minimize this negative log-likelihood function, which is equivalent to maximizing the likelihood function. A typical error function can be designed as the following cross-entropy error function :

Cross entropy error function

This function may look complicated, but it's simpler if we break it down.

Cross entropy error function

From the above formula, we can find that if the sample is correctly classified, then the loss function L(t,y)and the negative log probability function are the same in the expression, that is

Comparison of loss function and error function

Since we can tonly take a value 0or 1, we can L(t, y)write it as:

log probability function

If you want to analyze each training data, then it is the following formula:

log probability function

Another reason we use the cross-entropy function is that in simple Logisticregression, the cross-entropy function is a convex loss function and the global minimum is easy to find.

Derivation of the cross-entropy loss function for the logistic function

∂ξ/∂yFor the derivation of the loss function , the calculation is as follows:

Derivation of loss function

zNow, it will be easy for us to differentiate the input parameters .

parameter derivation

At this point, the complete derivation is completed.

Full code, click here

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325670292&siteId=291194637