Complexity & learning rate & loss function

The content comes from the video, the video address: here1 , here2

The complexity of the neural network is expressed by: the number of network layers and the number of neural network parameters; the
space complexity:

  • The number of layers (calculate only the layers with computing power) = the number of hidden layers + 1output layers;

  • Total parameter = total w+ total b;
    time complexity:

  • Number of multiplication and addition operations
    eg.
    Insert picture description here
    total parameter = number of 3x4+4x1 + 4x2+2x1 = 26
    multiplication and addition operations =3x4 + 4x2 = 20

  • The exponential decay learning rate
    can first use a larger learning rate to quickly obtain a better solution, and then gradually reduce the learning rate to make the model stable in the later stage of training.
    Exponential decay learning rate = initial learning rate * learning rate decay rate (current number of rounds/how many rounds of decay once) Exponential decay learning rate = initial learning rate * learning rate decay rate^{(current number of rounds/how many rounds of decay once)}Refers to the number of attenuation Save learn learning rate=Early start school learning rateLearning learning rate decay reduction rate( When the front wheel Number / multiple small wheel decay minus one times )

  • Neural network optimization objective is to lossminimize a total of three ways:
    ① MSE; custom ②; ③ Cross Entropy;
    ① MSE, i.e. the mean square error can tf.reduce_mean(tf.square(y_-y))be expressed;

import tensorflow as tf
x = tf.random.normal([20, 2], mean=2, stddev=1, dtype=tf.float32)
y = [item1 + 2 * item2 for item1, item2 in x]
w = tf.Variable(tf.random.normal([2, 1], mean=0, stddev=1))

epoch = 5000
lr = 0.002
for epoch in range(epoch):
    with tf.GradientTape() as tape:
        y_hat = tf.matmul(x, w)
        loss = tf.reduce_mean(tf.square(y_hat - y))
    w_grad = tape.gradient(loss, w)
    w.assign_sub(lr * w_grad)

print(w.numpy().T)

Insert picture description here
② Custom;
still above, we define the loss function as a piecewise function. When the predicted value is greater than the true value, we think it is not good for our model, namely:
f (y ^, y) = {3 (y ^ − y) y ^ ⩾ yy − y ^ y ^ ⩽ yf(\hat y, y) = \begin{cases} 3(\hat y-y)& \hat y \geqslant y\\ y-\hat y& \hat y \leqslant y \end{cases}f(Y^,and )={ 3(Y^and )YY^Y^YY^and

import tensorflow as tf
x = tf.random.normal([20, 2], mean=2, stddev=1, dtype=tf.float32)
y = [item1 + 2 * item2 for item1, item2 in x]
w = tf.Variable(tf.random.normal([2, 1], mean=0, stddev=1))

epoch = 5000
lr = 0.002
for epoch in range(epoch):
    with tf.GradientTape() as tape:
        y_hat = tf.matmul(x, w)
        loss = tf.reduce_mean(tf.where(tf.greater(y_hat, y), 3*(y_hat - y), y-y_hat))
    w_grad = tape.gradient(loss, w)
    w.assign_sub(lr * w_grad)

print(w.numpy().T) # [[0.73728406 0.83368826]]

Since the larger the prediction has a greater impact on the model loss function, the prediction should be as small as possible, that is, the results here are all small.
We make a simple modification, namely:

loss = tf.reduce_mean(tf.where(tf.greater(y_hat, y), (y_hat - y), 3*(y-y_hat)))

At this time, when the predicted value is less than the true value, we think that it is not good for the model, that is, we should try to make the prediction as large as possible. The training result at this time: [[1.6747012 1.9530903]]
③ Cross-entropy loss function The
tf.losses.categorical_crossentropy(y, y_)
cross-entropy loss function represents the distance between two probability distributions .
For example, given a known category (1, 0), the predicted

y_1 = (0.6, 0.4);
y_2 = (0.8, 0.2);

We need to measure which answer is closer to the standard answer, we can use cross entropy to calculate and measure.
Insert picture description here

Guess you like

Origin blog.csdn.net/qq_26460841/article/details/113112609