Type of artificial neural network
This article is about a technique used in machine learning. For other uses, see
Highway .
In machine learning , the Highway Network was the first working very deep feedforward neural network with hundreds of layers, much deeper than previous neural networks .[ 1] [ 2] [ 3]
It uses skip connections modulated by learned gating mechanisms to regulate information flow, inspired by long short-term memory (LSTM) recurrent neural networks .[ 4] [ 5]
The advantage of the Highway Network over other deep learning architectures is its ability to overcome or partially prevent the vanishing gradient problem ,[ 6] thus improving its optimization. Gating mechanisms are used to facilitate information flow across the many layers ("information highways").[ 1] [ 2]
Highway Networks have found use in text sequence labeling and speech recognition tasks.[ 7] [ 8]
In 2014, the state of the art was training deep neural networks with 20 to 30 layers.[ 9] Stacking too many layers led to a steep reduction in training accuracy,[ 10] known as the "degradation" problem.[ 11] In 2015, two techniques were developed to train such networks: the Highway Network (published in May), and the residual neural network , or ResNet[ 12] (December). ResNet behaves like an open-gated Highway Net.
^ a b Srivastava, Rupesh Kumar; Greff, Klaus; Schmidhuber, Jürgen (2 May 2015). "Highway Networks". arXiv :1505.00387 [cs.LG ].
^ a b Srivastava, Rupesh K; Greff, Klaus; Schmidhuber, Juergen (2015). "Training Very Deep Networks" . Advances in Neural Information Processing Systems . 28 . Curran Associates, Inc.: 2377–2385.
^ Schmidhuber, Jürgen (2021). "The most cited neural networks all build on work done in my labs" . AI Blog . IDSIA, Switzerland. Retrieved 2022-04-30 .
^ Sepp Hochreiter ; Jürgen Schmidhuber (1997). "Long short-term memory" . Neural Computation . 9 (8): 1735–1780. doi :10.1162/neco.1997.9.8.1735 . PMID 9377276 . S2CID 1915014 .
^ Felix A. Gers; Jürgen Schmidhuber; Fred Cummins (2000). "Learning to Forget: Continual Prediction with LSTM". Neural Computation . 12 (10): 2451–2471. CiteSeerX 10.1.1.55.5709 . doi :10.1162/089976600300015015 . PMID 11032042 . S2CID 11598600 .
^ Hochreiter, Sepp (1991). Untersuchungen zu dynamischen neuronalen Netzen (PDF) (diploma thesis). Technical University Munich, Institute of Computer Science, advisor: J. Schmidhuber.
^ Liu, Liyuan; Shang, Jingbo; Xu, Frank F.; Ren, Xiang; Gui, Huan; Peng, Jian; Han, Jiawei (12 September 2017). "Empower Sequence Labeling with Task-Aware Neural Language Model". arXiv :1709.04109 [cs.CL ].
^ Kurata, Gakuto; Ramabhadran, Bhuvana ; Saon, George; Sethy, Abhinav (19 September 2017). "Language Modeling with Highway LSTM". arXiv :1709.06436 [cs.CL ].
^ Simonyan, Karen; Zisserman, Andrew (2015-04-10), Very Deep Convolutional Networks for Large-Scale Image Recognition , arXiv :1409.1556
^ He, Kaiming; Zhang, Xiangyu; Ren, Shaoqing; Sun, Jian (2016). "Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification". arXiv :1502.01852 [cs.CV ].
^ He, Kaiming; Zhang, Xiangyu; Ren, Shaoqing; Sun, Jian (10 Dec 2015). Deep Residual Learning for Image Recognition . arXiv :1512.03385 .
^ He, Kaiming; Zhang, Xiangyu; Ren, Shaoqing; Sun, Jian (2016). Deep Residual Learning for Image Recognition . 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Las Vegas, NV, USA: IEEE. pp. 770–778. arXiv :1512.03385 . doi :10.1109/CVPR.2016.90 . ISBN 978-1-4673-8851-1 .