First of all, understand what this is for. When we do the derivation, we will encounter a situation where the derivation function suddenly becomes very steep. Does it mean that the next step will be much higher than the normal value? This function The significance of is that in the derivation function that suddenly becomes steep, some judgments are added. If it is too steep, the derivation step is appropriately reduced.
tf.clip_by_global_norm(t_list, clip_norm, use_norm=None, name=None)
Truncates the values of multiple tensors by the ratio of the sum of the weight gradients. t_list
is the gradient tensor and clip_norm
is the truncated ratio, this function returns the truncated gradient tensor and a global norm of all tensors.
t_list[i]
The update formula is as follows:
t_list[i] * clip_norm / max(global_norm, clip_norm)
where global_norm = sqrt(sum([l2norm(t)**2 for t in t_list]))
global_norm
is the sum of squares of all gradients, if clip_norm > global_norm
, no interception is performed.
But this function is slower than clip_by_norm()
that, because all parameters must be prepared before interception.