Multilayer perceptron

In deep learning, a multilayer perceptron (MLP) is a name for a modern feedforward neural network consisting of fully connected neurons with nonlinear activation functions, organized in layers, notable for being able to distinguish data that is not linearly separable.^[1]

Modern neural networks are trained using backpropagation^[2]^[3]^[4]^[5]^[6] and are colloquially referred to as "vanilla" networks.^[7] MLPs grew out of an effort to improve single-layer perceptrons, which could only be applied to linearly separable data. A perceptron traditionally used a Heaviside step function as its nonlinear activation function. However, the backpropagation algorithm requires that modern MLPs use continuous activation functions such as sigmoid or ReLU.^[8]

Multilayer perceptrons form the basis of deep learning,^[9] and are applicable across a vast set of diverse domains.^[10]

^ Cybenko, G. 1989. Approximation by superpositions of a sigmoidal function Mathematics of Control, Signals, and Systems, 2(4), 303–314.
^ Linnainmaa, Seppo (1970). The representation of the cumulative rounding error of an algorithm as a Taylor expansion of the local rounding errors (Masters) (in Finnish). University of Helsinki. pp. 6–7.
^ Kelley, Henry J. (1960). "Gradient theory of optimal flight paths". ARS Journal. 30 (10): 947–954. doi:10.2514/8.5282.
^ Rosenblatt, Frank. x. Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms. Spartan Books, Washington DC, 1961
^ Werbos, Paul (1982). "Applications of advances in nonlinear sensitivity analysis" (PDF). System modeling and optimization. Springer. pp. 762–770. Archived (PDF) from the original on 14 April 2016. Retrieved 2 July 2017.
^ Rumelhart, David E., Geoffrey E. Hinton, and R. J. Williams. "Learning Internal Representations by Error Propagation". David E. Rumelhart, James L. McClelland, and the PDP research group. (editors), Parallel distributed processing: Explorations in the microstructure of cognition, Volume 1: Foundation. MIT Press, 1986.
^ Hastie, Trevor. Tibshirani, Robert. Friedman, Jerome. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, New York, NY, 2009.
^ "Why is the ReLU function not differentiable at x=0?".
^ Almeida, Luis B (2020) [1996]. "Multilayer perceptrons". In Fiesler, Emile; Beale, Russell (eds.). Handbook of Neural Computation. CRC Press. pp. C1-2. doi:10.1201/9780429142772. ISBN 978-0-429-14277-2.
^ Gardner, Matt W; Dorling, Stephen R (1998). "Artificial neural networks (the multilayer perceptron)—a review of applications in the atmospheric sciences". Atmospheric Environment. 32 (14–15). Elsevier: 2627–2636. Bibcode:1998AtmEn..32.2627G. doi:10.1016/S1352-2310(97)00447-0.

[Cybenko1989-1] Cybenko, G. 1989. Approximation by superpositions of a sigmoidal function Mathematics of Control, Signals, and Systems, 2(4), 303–314.

[lin1970-2] Linnainmaa, Seppo (1970). The representation of the cumulative rounding error of an algorithm as a Taylor expansion of the local rounding errors (Masters) (in Finnish). University of Helsinki. pp. 6–7.

[kelley1960-3] Kelley, Henry J. (1960). "Gradient theory of optimal flight paths". ARS Journal. 30 (10): 947–954. doi:10.2514/8.5282.

[4] Rosenblatt, Frank. x. Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms. Spartan Books, Washington DC, 1961

[werbos1982-5] Werbos, Paul (1982). "Applications of advances in nonlinear sensitivity analysis" (PDF). System modeling and optimization. Springer. pp. 762–770. Archived (PDF) from the original on 14 April 2016. Retrieved 2 July 2017.

[rumelhart1986-6] Rumelhart, David E., Geoffrey E. Hinton, and R. J. Williams. "Learning Internal Representations by Error Propagation". David E. Rumelhart, James L. McClelland, and the PDP research group. (editors), Parallel distributed processing: Explorations in the microstructure of cognition, Volume 1: Foundation. MIT Press, 1986.

[7] Hastie, Trevor. Tibshirani, Robert. Friedman, Jerome. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, New York, NY, 2009.

[8] "Why is the ReLU function not differentiable at x=0?".

[MLPBook-9] Almeida, Luis B (2020) [1996]. "Multilayer perceptrons". In Fiesler, Emile; Beale, Russell (eds.). Handbook of Neural Computation. CRC Press. pp. C1-2. doi:10.1201/9780429142772. ISBN 978-0-429-14277-2.

[AtmosSciPaper-10] Gardner, Matt W; Dorling, Stephen R (1998). "Artificial neural networks (the multilayer perceptron)—a review of applications in the atmospheric sciences". Atmospheric Environment. 32 (14–15). Elsevier: 2627–2636. Bibcode:1998AtmEn..32.2627G. doi:10.1016/S1352-2310(97)00447-0.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]