The content comes from the video, the video address: here1 , here2
The complexity of the neural network is expressed by: the number of network layers and the number of neural network parameters; the
space complexity:
-
The number of layers (calculate only the layers with computing power) = the number of hidden layers +
1
output layers; -
Total parameter = total
w
+ totalb
;
time complexity: -
Number of multiplication and addition operations
eg.
total parameter = number of3x4+4x1 + 4x2+2x1 = 26
multiplication and addition operations =3x4 + 4x2 = 20
-
The exponential decay learning rate
can first use a larger learning rate to quickly obtain a better solution, and then gradually reduce the learning rate to make the model stable in the later stage of training.
Exponential decay learning rate = initial learning rate * learning rate decay rate (current number of rounds/how many rounds of decay once) Exponential decay learning rate = initial learning rate * learning rate decay rate^{(current number of rounds/how many rounds of decay once)}Refers to the number of attenuation Save learn learning rate=Early start school learning rate∗Learning learning rate decay reduction rate( When the front wheel Number / multiple small wheel decay minus one times ) -
Neural network optimization objective is to
loss
minimize a total of three ways:
① MSE; custom ②; ③ Cross Entropy;
① MSE, i.e. the mean square error cantf.reduce_mean(tf.square(y_-y))
be expressed;
import tensorflow as tf
x = tf.random.normal([20, 2], mean=2, stddev=1, dtype=tf.float32)
y = [item1 + 2 * item2 for item1, item2 in x]
w = tf.Variable(tf.random.normal([2, 1], mean=0, stddev=1))
epoch = 5000
lr = 0.002
for epoch in range(epoch):
with tf.GradientTape() as tape:
y_hat = tf.matmul(x, w)
loss = tf.reduce_mean(tf.square(y_hat - y))
w_grad = tape.gradient(loss, w)
w.assign_sub(lr * w_grad)
print(w.numpy().T)
② Custom;
still above, we define the loss function as a piecewise function. When the predicted value is greater than the true value, we think it is not good for our model, namely:
f (y ^, y) = {3 (y ^ − y) y ^ ⩾ yy − y ^ y ^ ⩽ yf(\hat y, y) = \begin{cases} 3(\hat y-y)& \hat y \geqslant y\\ y-\hat y& \hat y \leqslant y \end{cases}f(Y^,and )={
3(Y^−and )Y−Y^Y^⩾YY^⩽and
import tensorflow as tf
x = tf.random.normal([20, 2], mean=2, stddev=1, dtype=tf.float32)
y = [item1 + 2 * item2 for item1, item2 in x]
w = tf.Variable(tf.random.normal([2, 1], mean=0, stddev=1))
epoch = 5000
lr = 0.002
for epoch in range(epoch):
with tf.GradientTape() as tape:
y_hat = tf.matmul(x, w)
loss = tf.reduce_mean(tf.where(tf.greater(y_hat, y), 3*(y_hat - y), y-y_hat))
w_grad = tape.gradient(loss, w)
w.assign_sub(lr * w_grad)
print(w.numpy().T) # [[0.73728406 0.83368826]]
Since the larger the prediction has a greater impact on the model loss function, the prediction should be as small as possible, that is, the results here are all small.
We make a simple modification, namely:
loss = tf.reduce_mean(tf.where(tf.greater(y_hat, y), (y_hat - y), 3*(y-y_hat)))
At this time, when the predicted value is less than the true value, we think that it is not good for the model, that is, we should try to make the prediction as large as possible. The training result at this time: [[1.6747012 1.9530903]]
③ Cross-entropy loss function The
tf.losses.categorical_crossentropy(y, y_)
cross-entropy loss function represents the distance between two probability distributions .
For example, given a known category (1, 0)
, the predicted
y_1 = (0.6, 0.4);
y_2 = (0.8, 0.2);
We need to measure which answer is closer to the standard answer, we can use cross entropy to calculate and measure.