Training Deep Neural Networks

34 / 49

In Nesterov Optimizer, the only difference from vanilla Momentum optimization is that the gradient is measured at theta + theta*m rather than at theta, where theta represents the current parameters/weights and m is the momentum.

See Answer

Note - Having trouble with the assessment engine? Follow the steps listed here

No hints are availble for this assesment

Loading comments...