Suppose batch gradient descent in a deep network is taking excessively long to find a value of the parameters that achieves a small value for the cost function . Which of the following techniques could help find parameter values that attain a small value for J?
- Suppose batch gradient descent in a deep network is taking excessively long to find a value of the parameters that achieves a small value for the cost function . Which of the following techniques could help find parameter values that attain a small value for J? (Check all that apply)
- Try mini-batch gradient descent
- Try initializing all the weights to zero
- Try better random initialization for the weights
- Try tuning the learning rate α
- Try using Adam
Get All Week Quiz Answer: