Delta rule

In machine learning, the delta rule is a gradient descent learning rule for updating the weights of the inputs to artificial neurons in a single-layer neural network.^[1] It can be derived as the backpropagation algorithm for a single-layer neural network with mean-square error loss function.

For a neuron $j$ with activation function $g(x)$ , the delta rule for neuron $j$ 's $i$ -th weight $w_{ji}$ is given by

$\Delta w_{ji}=\alpha (t_{j}-y_{j})g'(h_{j})x_{i},$

where

$\alpha$ is a small constant called learning rate
$g(x)$ is the neuron's activation function
$g'$ is the derivative of $g$
$t_{j}$ is the target output
$h_{j}$ is the weighted sum of the neuron's inputs
$y_{j}$ is the actual output
$x_{i}$ is the $i$ -th input.

It holds that ${\textstyle h_{j}=\sum _{i}x_{i}w_{ji}}$ and $y_{j}=g(h_{j})$ .

The delta rule is commonly stated in simplified form for a neuron with a linear activation function as $\Delta w_{ji}=\alpha \left(t_{j}-y_{j}\right)x_{i}$

While the delta rule is similar to the perceptron's update rule, the derivation is different. The perceptron uses the Heaviside step function as the activation function $g(h)$ , and that means that $g'(h)$ does not exist at zero, and is equal to zero elsewhere, which makes the direct application of the delta rule impossible.

^ Russell, Ingrid. "The Delta Rule". University of Hartford. Archived from the original on 4 March 2016. Retrieved 5 November 2012.

[1] Russell, Ingrid. "The Delta Rule". University of Hartford. Archived from the original on 4 March 2016. Retrieved 5 November 2012.

[1]