[Deep Learning] 2-3 Neural Network - Output Layer Design

Feedforward Neural Network (Feedforward Neural Network), the single-layer perceptron and multi-layer perceptron introduced before belong to the feedforward neural network. The reason why it is called feedforward (Feedforward) may be related to the forward flow of information: data Starting from the input, it flows through the intermediate calculation process, and finally reaches the output layer .
Let's see how the output layer is designed

Machine learning problems can be roughly divided into classification problems and regression problems.
The classification problem is the problem of which category the data belongs to. For example, to distinguish whether the person in the image is male or female.
The regression problem is the problem of predicting a (continuous) value based on an input. For example, predicting the weight of a person based on an image of a person. Classification
and regression belong to supervised learning , so it is called Supervised learning, because this type of algorithm must know what to predict, that is, the classification information of the target variable.
Corresponding to supervised learning is unsupervised learning , in which the data has no category information and no target value is given . In unsupervised learning, the process of dividing a data set into multiple classes of similar objects is called clustering : the process of finding statistical values ​​that describe the data is called density estimation . In addition, unsupervised learning can also reduce the dimensionality of data features so that we can use two-dimensional or three-dimensional graphics to display data information more intuitively.

The identity function, sigmoid function and softmax function The
identity function is mainly used to activate the regression problem. In this kind of problem, it is often necessary to protect the final output result to the greatest extent , because the final calculation result of the regression problem is often a specific weight. It is also the final answer . Of course, the intact output is the best. So the identity function simply means that it does nothing,
the identity function will output the input as it is

The sigmoid function is well suited for binary classification due to its constraints in [0,1].

The softmax function fully considers the influence of the weight of each output node, so that the description of the probability is more accurate and suitable for multi-class classification.
The softmax function can be represented by the following formula:
insert image description here
the numerator of the softmax function is the exponential function of the input signal , and the denominator is the sum of the exponent functions of all input signals . (all values ​​of softmax add up to 1)

Use python to implement softmax

def softmax(a):
	exp_a = np.exp(a)
	sum_exp_a = np.sum(exp_a)
	y = exp_a / sum_exp_a
	return y

The defect of softmax is the overflow problem, because using exponential calculation, the value will become very large, the improved code is as follows:

def softmax(a):
	c = np.max(a)
	exp_a = np.exp(a - c) #溢出对策
	sum_exp_a = np.sum(exp_a)
	y = exp_a / sum_exp_a
	return y

Solve the problem of overflow by reducing the maximum value in the input signal

Features of the softmax function
The output of the softmax function is a real number between 0.0 and 1.0. And, the sum of the output values ​​of the softmax function is 1. It is precisely because the sum of the output values ​​​​of the softmax function is 1, so the problem can be dealt with by a probabilistic (statistical) method

Questions about understanding machine learning The steps of machine learning can be divided into two stages: "learning" and " reasoning
" . Classification). The softmax function is used in learning and not in the inference phase

Introduce the softmax function into the output layer
as in the following example:


def forward_net(network,x):
    W1 = network['W1']
    b1 = network['b1']
    W2 = network['W2']
    b2 = network['b2']
    W3 = network['W3']
    b3 = network['b3']
    x = np.dot(x,W1)+b1
    x = ReLU(x)
    x = np.dot(x,W2)+b2
    x = ReLU(x)
    x = np.dot(x, W3) + b3
    x = softmax(x)
    return x

The result of the operation is:

[0.00196166 0.99803834]

Guess you like

Origin blog.csdn.net/loyd3/article/details/130594031