Gradient

In vector calculus, the gradient of a scalar-valued differentiable function $f$ of several variables is the vector field (or vector-valued function) $\nabla f$ whose value at a point $p$ gives the direction and the rate of fastest increase. The gradient transforms like a vector under change of basis of the space of variables of $f$ . If the gradient of a function is non-zero at a point $p$ , the direction of the gradient is the direction in which the function increases most quickly from $p$ , and the magnitude of the gradient is the rate of increase in that direction, the greatest absolute directional derivative.^[1] Further, a point where the gradient is the zero vector is known as a stationary point. The gradient thus plays a fundamental role in optimization theory, where it is used to minimize a function by gradient descent. In coordinate-free terms, the gradient of a function $f(\mathbf {r} )$ may be defined by:

$df=\nabla f\cdot d\mathbf {r}$

where $df$ is the total infinitesimal change in $f$ for an infinitesimal displacement $d\mathbf {r}$ , and is seen to be maximal when $d\mathbf {r}$ is in the direction of the gradient $\nabla f$ . The nabla symbol $\nabla$ , written as an upside-down triangle and pronounced "del", denotes the vector differential operator.

When a coordinate system is used in which the basis vectors are not functions of position, the gradient is given by the vector^[a] whose components are the partial derivatives of $f$ at $p$ .^[2] That is, for $f\colon \mathbb {R} ^{n}\to \mathbb {R}$ , its gradient $\nabla f\colon \mathbb {R} ^{n}\to \mathbb {R} ^{n}$ is defined at the point $p=(x_{1},\ldots ,x_{n})$ in n-dimensional space as the vector^[b]

$\nabla f(p)={\begin{bmatrix}{\frac {\partial f}{\partial x_{1}}}(p)\\\vdots \\{\frac {\partial f}{\partial x_{n}}}(p)\end{bmatrix}}.$

Note that the above definition for gradient is defined for the function $f$ only if $f$ is differentiable at $p$ . There can be functions for which partial derivatives exist in every direction but fail to be differentiable. Furthermore, this definition as the vector of partial derivatives is only valid when the basis of the coordinate system is orthonormal. For any other basis, the Metric tensor at that point needs to be taken into account.

For example, the function $f(x,y)={\frac {x^{2}y}{x^{2}+y^{2}}}$ unless at origin where $f(0,0)=0$ , is not differentiable at the origin as it does not have a well defined tangent plane despite having well defined partial derivatives in every direction at the origin.^[3] In this particular example, under rotation of x-y coordinate system, the above formula for gradient fails to transform like a vector (gradient becomes dependent on choice of basis for coordinate system) and also fails to point towards the 'steepest ascent' in some orientations. For differentiable functions where the formula for gradient holds, it can be shown to always transform as a vector under transformation of the basis so as to always point towards the fastest increase.

The gradient is dual to the total derivative $df$ : the value of the gradient at a point is a tangent vector – a vector at each point; while the value of the derivative at a point is a cotangent vector – a linear functional on vectors.^[c] They are related in that the dot product of the gradient of $f$ at a point $p$ with another tangent vector $\mathbf {v}$ equals the directional derivative of $f$ at $p$ of the function along $\mathbf {v}$ ; that is, ${\textstyle \nabla f(p)\cdot \mathbf {v} ={\frac {\partial f}{\partial \mathbf {v} }}(p)=df_{p}(\mathbf {v} )}$ . The gradient admits multiple generalizations to more general functions on manifolds; see § Generalizations.

^
- Bachman (2007, p. 77)
- Downing (2010, pp. 316–317)
- Kreyszig (1972, p. 309)
- McGraw-Hill (2007, p. 196)
- Moise (1967, p. 684)
- Protter & Morrey (1970, p. 715)
- Swokowski et al. (1994, pp. 1036, 1038–1039)
^
- Bachman (2007, p. 76)
- Beauregard & Fraleigh (1973, p. 84)
- Downing (2010, p. 316)
- Harper (1976, p. 15)
- Kreyszig (1972, p. 307)
- McGraw-Hill (2007, p. 196)
- Moise (1967, p. 683)
- Protter & Morrey (1970, p. 714)
- Swokowski et al. (1994, p. 1038)
^ "Non-differentiable functions must have discontinuous partial derivatives - Math Insight". mathinsight.org. Retrieved 2023-10-21.

Cite error: There are <ref group=lower-alpha> tags or {{efn}} templates on this page, but the references will not show without a {{reflist|group=lower-alpha}} template or {{notelist}} template (see the help page).

[1] 
Bachman (2007, p. 77)

Downing (2010, pp. 316–317)

Kreyszig (1972, p. 309)

McGraw-Hill (2007, p. 196)

Moise (1967, p. 684)

Protter & Morrey (1970, p. 715)

Swokowski et al. (1994, pp. 1036, 1038–1039)

[2] Bachman (2007, p. 77)

[3] Downing (2010, pp. 316–317)

[4] Kreyszig (1972, p. 309)

[5] McGraw-Hill (2007, p. 196)

[6] Moise (1967, p. 684)

[7] Protter & Morrey (1970, p. 715)

[8] Swokowski et al. (1994, pp. 1036, 1038–1039)

[3] 
Bachman (2007, p. 76)

Beauregard & Fraleigh (1973, p. 84)

Downing (2010, p. 316)

Harper (1976, p. 15)

Kreyszig (1972, p. 307)

McGraw-Hill (2007, p. 196)

Moise (1967, p. 683)

Protter & Morrey (1970, p. 714)

Swokowski et al. (1994, p. 1038)

[10] Bachman (2007, p. 76)

[11] Beauregard & Fraleigh (1973, p. 84)

[12] Downing (2010, p. 316)

[13] Harper (1976, p. 15)

[14] Kreyszig (1972, p. 307)

[15] McGraw-Hill (2007, p. 196)

[16] Moise (1967, p. 683)

[17] Protter & Morrey (1970, p. 714)

[18] Swokowski et al. (1994, p. 1038)

[5] "Non-differentiable functions must have discontinuous partial derivatives - Math Insight". mathinsight.org. Retrieved 2023-10-21.

[1]

[a]

[2]

[b]

[3]

[c]