Gradient descent

Gradient Descent in 2D

Gradient descent is a method for unconstrained mathematical optimization. It is a first-order iterative algorithm for minimizing a differentiable multivariate function.

The idea is to take repeated steps in the opposite direction of the gradient (or approximate gradient) of the function at the current point, because this is the direction of steepest descent. Conversely, stepping in the direction of the gradient will lead to a trajectory that maximizes that function; the procedure is then known as gradient ascent. It is particularly useful in machine learning for minimizing the cost or loss function.^[1] Gradient descent should not be confused with local search algorithms, although both are iterative methods for optimization.

Gradient descent is generally attributed to Augustin-Louis Cauchy, who first suggested it in 1847.^[2] Jacques Hadamard independently proposed a similar method in 1907.^[3]^[4] Its convergence properties for non-linear optimization problems were first studied by Haskell Curry in 1944,^[5] with the method becoming increasingly well-studied and used in the following decades.^[6]^[7]

A simple extension of gradient descent, stochastic gradient descent, serves as the most basic algorithm used for training most deep networks today.

^ Boyd, Stephen; Vandenberghe, Lieven (2004-03-08). Convex Optimization. Cambridge University Press. doi:10.1017/cbo9780511804441. ISBN 978-0-521-83378-3.
^ Lemaréchal, C. (2012). "Cauchy and the Gradient Method" (PDF). Doc Math Extra: 251–254. Archived from the original (PDF) on 2018-12-29. Retrieved 2020-01-26.
^ Hadamard, Jacques (1908). "Mémoire sur le problème d'analyse relatif à l'équilibre des plaques élastiques encastrées". Mémoires présentés par divers savants éstrangers à l'Académie des Sciences de l'Institut de France. 33.
^ Courant, R. (1943). "Variational methods for the solution of problems of equilibrium and vibrations". Bulletin of the American Mathematical Society. 49 (1): 1–23. doi:10.1090/S0002-9904-1943-07818-4.
^ Curry, Haskell B. (1944). "The Method of Steepest Descent for Non-linear Minimization Problems". Quart. Appl. Math. 2 (3): 258–261. doi:10.1090/qam/10667.
^ Cite error: The named reference BP was invoked but never defined (see the help page).
^ Cite error: The named reference AK82 was invoked but never defined (see the help page).

[auto-1] Boyd, Stephen; Vandenberghe, Lieven (2004-03-08). Convex Optimization. Cambridge University Press. doi:10.1017/cbo9780511804441. ISBN 978-0-521-83378-3.

[2] Lemaréchal, C. (2012). "Cauchy and the Gradient Method" (PDF). Doc Math Extra: 251–254. Archived from the original (PDF) on 2018-12-29. Retrieved 2020-01-26.

[3] Hadamard, Jacques (1908). "Mémoire sur le problème d'analyse relatif à l'équilibre des plaques élastiques encastrées". Mémoires présentés par divers savants éstrangers à l'Académie des Sciences de l'Institut de France. 33.

[4] Courant, R. (1943). "Variational methods for the solution of problems of equilibrium and vibrations". Bulletin of the American Mathematical Society. 49 (1): 1–23. doi:10.1090/S0002-9904-1943-07818-4.

[5] Curry, Haskell B. (1944). "The Method of Steepest Descent for Non-linear Minimization Problems". Quart. Appl. Math. 2 (3): 258–261. doi:10.1090/qam/10667.

[BP-6] Cite error: The named reference BP was invoked but never defined (see the help page).

[AK82-7] Cite error: The named reference AK82 was invoked but never defined (see the help page).

[1]

[2]

[3]

[4]

[5]

[6]

[7]