Depth hands-on science learning the first lesson: to get started from the multi-class classification -Autograd

Use autograd automatically derivative

In machine learning, we usually updates the model parameters in order to solve using a gradient descent. Loss function gradient model parameters on a point direction can be reduced loss function values, we continue to update the model along the direction of the gradient to minimize the loss function. Although the gradient calculation is relatively straightforward, but for complex models, for example, up to dozens layer neural network, a manual calculation of the gradient is very difficult.

To this end MXNet provide autograd package to automate the process of derivation. Although most claim depth learning framework to automatically compile the computing FIG derivative, mxnet.autograd derivative may be normal programming command, it creates each rear end of the real-time computation graph can be immediately calculated gradient.

Let us step by step to introduce this package. We first import autograd.

import mxnet.ndarray as nd
import mxnet.autograd as ag

Attach a variable gradient

Suppose we want to assume the function f = 2 * (x ** 2) the derivative with respect to x. Let's create a variable x, and initial value.

x = nd.array([[1, 2], [3, 4]])
x
[[ 1.  2.]
 [ 3.  4.]]
<NDArray 2x2 @cpu(0)>

When performed derivation, we need a place to keep the derivative of x, this can attach_grad () corresponding to the space requirements of the system by means of the application of NDArray.

x.attach_grad()

The following definitions f. By default, MXNet not automatically recorded and constructed for the derivation computation graph, where we need to use autograd Record () function to explicitly request We claim MXNet recording program guide.

with ag.record():
    y = x * 2
    z = y * x
z
[[  2.   8.]
 [ 18.  32.]]
<NDArray 2x2 @cpu(0)>

Then we can () be carried out through derivative z.backward. If z is a scalar, then z.backward () is equivalent to nd.sum (z) .backward ().

z.backward()
x.grad
[[  4.   8.]
 [ 12.  16.]]
<NDArray 2x2 @cpu(0)>

Now we look at the derivative seek out is not correct. Noting y = x * 2 and z = x * y, z therefore equivalent to 2 * x * x. Then its derivative is dz / dx = 4 * x.

x.grad == 4 * x
[[ 1.  1.]
 [ 1.  1.]]
<NDArray 2x2 @cpu(0)>
x.grad == 3 * x
[[ 0.  0.]
 [ 0.  0.]]
<NDArray 2x2 @cpu(0)>

Derivative control flow

The convenience of a program is almost imperative derivative can be an arbitrary program guide, which contains even Python control flow. Consider the following program, and which comprises a control flow for if, but the number of loop iterations performed and the sentence is determined depending on the value of the input. Different input will lead to the implementation of this program is not the same. (For the calculation of FIG frame, this corresponds to the dynamic map is a configuration diagram will vary depending on the input data).

def f(a):
    b = a * 2
    while nd.norm(b).asscalar() < 1000:
        b = b * 2
    if nd.sum(b).asscalar() > 0:
        c = b
    else:
        c = 100 * b
    return c

As before, we can use the record and the record backward derivation.

a = nd.random_normal(shape=3)
a.attach_grad()
with ag.record():
    c = f(a)
c.backward()
a.grad
[ 51200.  51200.  51200.]
<NDArray 3 @cpu(0)>

Noting given input a, the output f = xa, the value (a) x depends on the input a. So there df / da = x, we can simply evaluate Derivative Derivative automatically:

c/a
[ 51200.  51200.  51200.]
<NDArray 3 @cpu(0)>
a.grad == c/a
[ 1.  1.  1.]
<NDArray 3 @cpu(0)>

Guess you like

Origin www.cnblogs.com/KisInfinite/p/11441836.html