Highway network

In machine learning, the Highway Network was the first working very deep feedforward neural network with hundreds of layers, much deeper than previous neural networks.[1][2][3] It uses skip connections modulated by learned gating mechanisms to regulate information flow, inspired by long short-term memory (LSTM) recurrent neural networks.[4][5] The advantage of the Highway Network over other deep learning architectures is its ability to overcome or partially prevent the vanishing gradient problem,[6] thus improving its optimization. Gating mechanisms are used to facilitate information flow across the many layers ("information highways").[1][2]

Highway Networks have found use in text sequence labeling and speech recognition tasks.[7][8]

In 2014, the state of the art was training deep neural networks with 20 to 30 layers.[9] Stacking too many layers led to a steep reduction in training accuracy,[10] known as the "degradation" problem.[11] In 2015, two techniques were developed to train such networks: the Highway Network (published in May), and the residual neural network, or ResNet[12] (December). ResNet behaves like an open-gated Highway Net.

  1. ^ a b Srivastava, Rupesh Kumar; Greff, Klaus; Schmidhuber, Jürgen (2 May 2015). "Highway Networks". arXiv:1505.00387 [cs.LG].
  2. ^ a b Srivastava, Rupesh K; Greff, Klaus; Schmidhuber, Juergen (2015). "Training Very Deep Networks". Advances in Neural Information Processing Systems. 28. Curran Associates, Inc.: 2377–2385.
  3. ^ Schmidhuber, Jürgen (2021). "The most cited neural networks all build on work done in my labs". AI Blog. IDSIA, Switzerland. Retrieved 2022-04-30.
  4. ^ Sepp Hochreiter; Jürgen Schmidhuber (1997). "Long short-term memory". Neural Computation. 9 (8): 1735–1780. doi:10.1162/neco.1997.9.8.1735. PMID 9377276. S2CID 1915014.
  5. ^ Felix A. Gers; Jürgen Schmidhuber; Fred Cummins (2000). "Learning to Forget: Continual Prediction with LSTM". Neural Computation. 12 (10): 2451–2471. CiteSeerX 10.1.1.55.5709. doi:10.1162/089976600300015015. PMID 11032042. S2CID 11598600.
  6. ^ Hochreiter, Sepp (1991). Untersuchungen zu dynamischen neuronalen Netzen (PDF) (diploma thesis). Technical University Munich, Institute of Computer Science, advisor: J. Schmidhuber.
  7. ^ Liu, Liyuan; Shang, Jingbo; Xu, Frank F.; Ren, Xiang; Gui, Huan; Peng, Jian; Han, Jiawei (12 September 2017). "Empower Sequence Labeling with Task-Aware Neural Language Model". arXiv:1709.04109 [cs.CL].
  8. ^ Kurata, Gakuto; Ramabhadran, Bhuvana; Saon, George; Sethy, Abhinav (19 September 2017). "Language Modeling with Highway LSTM". arXiv:1709.06436 [cs.CL].
  9. ^ Simonyan, Karen; Zisserman, Andrew (2015-04-10), Very Deep Convolutional Networks for Large-Scale Image Recognition, arXiv:1409.1556
  10. ^ He, Kaiming; Zhang, Xiangyu; Ren, Shaoqing; Sun, Jian (2016). "Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification". arXiv:1502.01852 [cs.CV].
  11. ^ He, Kaiming; Zhang, Xiangyu; Ren, Shaoqing; Sun, Jian (10 Dec 2015). Deep Residual Learning for Image Recognition. arXiv:1512.03385.
  12. ^ He, Kaiming; Zhang, Xiangyu; Ren, Shaoqing; Sun, Jian (2016). Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, NV, USA: IEEE. pp. 770–778. arXiv:1512.03385. doi:10.1109/CVPR.2016.90. ISBN 978-1-4673-8851-1.