- Why is the best mini-batch size usually not 1 and not m, but instead something in between?
- If the mini-batch size is 1, you end up having to process the entire training set before making any progress.
- If the mini-batch size is m, you end up with stochastic gradient descent, which is usually slower than mini-batch gradient descent.
- If the mini-batch size is m, you end up with batch gradient descent, which has to process the whole training set before making progress.
- If the mini-batch size is 1, you lose the benefits of vectorization across examples in the mini-batch.
Get All Week Quiz Answer: