AIP-210 Practice Q4

A. Faster convergence to the global optimum

B. Divergence or oscillation around the loss landscape

In gradient-based optimization, the learning rate is the step-size multiplier on each parameter update; if it is set excessively high, each update can jump past the local minimum rather than move toward it. That produces unstable training dynamics, with the loss increasing or the parameters bouncing from one side of the minimum to the other, which is the standard behavior described as divergence or oscillation in the loss surface.

C. Lower memory usage

D. Better generalization

Question 4

Explanation

Why each option is right or wrong