Why is the best mini-batch size usually not 1 and not m, but instead something in between?

  1. Why is the best mini-batch size usually not 1 and not m, but instead something in between?
    •  If the mini-batch size is 1, you end up having to process the entire training set before making any progress.
    •  If the mini-batch size is m, you end up with stochastic gradient descent, which is usually slower than mini-batch gradient descent.
    •  If the mini-batch size is m, you end up with batch gradient descent, which has to process the whole training set before making progress.
    •  If the mini-batch size is 1, you lose the benefits of vectorization across examples in the mini-batch.

Get All Week Quiz Answer:

Improving Deep Neural Networks: Hyperparameter Tuning, Regularization and Optimization Coursera Quiz Answer

Similar Posts