Batch Gradient Descent involves calculations over the full training set X, at each Gradient Descent step! As a result it is terribly slow on very large training sets. However, Gradient Descent scales well with the number of features. True or False?

