Training Deep Neural Networks

If we want to ensure that Gradient Clipping does not change the direction of the gradient vector, we should clip by norm by setting clipnorm instead of clipvalue because this will clip the whole gradient if its $L^2 $ norm is greater than the threshold we picked.

