Hinge loss

In machine learning, the hinge loss is a loss function used for training classifiers. The hinge loss is used for "maximum-margin" classification, most notably for support vector machines (SVMs).^[1]

For an intended output $t = \pm1$ and a classifier score $y$ , the hinge loss of the prediction $y$ is defined as

\ell (y)=\max(0,1-t\cdot y)

Note that $y$ should be the "raw" output of the classifier's decision function, not the predicted class label. For instance, in linear SVMs, $y=\mathbf {w} \cdot \mathbf {x} +b$ , where $(\mathbf {w} ,b)$ are the parameters of the hyperplane and $\mathbf {x}$ is the input variable(s).

When $t$ and $y$ have the same sign (meaning $y$ predicts the right class) and $|y|\geq 1$ , the hinge loss $\ell (y)=0$ . When they have opposite signs, $\ell (y)$ increases linearly with $y$ , and similarly if $|y|<1$ , even if it has the same sign (correct prediction, but not by enough margin).

^ Rosasco, L.; De Vito, E. D.; Caponnetto, A.; Piana, M.; Verri, A. (2004). "Are Loss Functions All the Same?" (PDF). Neural Computation. 16 (5): 1063–1076. CiteSeerX 10.1.1.109.6786. doi:10.1162/089976604773135104. PMID 15070510.

[1] Rosasco, L.; De Vito, E. D.; Caponnetto, A.; Piana, M.; Verri, A. (2004). "Are Loss Functions All the Same?" (PDF). Neural Computation. 16 (5): 1063–1076. CiteSeerX 10.1.1.109.6786. doi:10.1162/089976604773135104. PMID 15070510.

[1]