gradient descent on a 2-dimensional convex, quadratic cost function with condition number=100

- adding momentum the gradient speeds up the approximation, in these high-condition cases — still using gradient descent (which scales better than Newton-Raphson in high-D)
- like adding momentum in an oscillating mechanical system that vibrates too much
- heavy ball method (Polyak)

(Source: simons.berkeley.edu)