Training Models

53 / 56

It is harder for mini-batch gradient descent than SGD to escape the local minima?