tf.clip_by_global_norm

First of all, understand what this is for. When we do the derivation, we will encounter a situation where the derivation function suddenly becomes very steep. Does it mean that the next step will be much higher than the normal value? This function The significance of is that in the derivation function that suddenly becomes steep, some judgments are added. If it is too steep, the derivation step is appropriately reduced.

 

tf.clip_by_global_norm(t_list, clip_norm, use_norm=None, name=None)

 

Truncates the values ​​of multiple tensors by the ratio of the sum of the weight gradients. 
t_list is the gradient tensor and  clip_norm is the truncated ratio, this function returns the truncated gradient tensor and a global norm of all tensors.

 

t_list[i] The update formula is as follows:

t_list[i] * clip_norm / max(global_norm, clip_norm)

where global_norm = sqrt(sum([l2norm(t)**2 for t in t_list])) 
global_norm is the sum of squares of all gradients, if  clip_norm > global_norm , no interception is performed. 
But this function is slower than clip_by_norm() that, because all parameters must be prepared before interception.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325269381&siteId=291194637