Visual spatial attention

Visual spatial attention is a form of visual attention that involves directing attention to a location in space. Similar to its temporal counterpart visual temporal attention, these attention modules have been widely implemented in video analytics in computer vision to provide enhanced performance and human interpretable explanation[1][2][3] of deep learning models.

Spatial attention allows humans to selectively process visual information through prioritization of an area within the visual field. A region of space within the visual field is selected for attention and the information within this region then receives further processing. Research shows that when spatial attention is evoked, an observer is typically faster and more accurate at detecting a target that appears in an expected location compared to an unexpected location.[4] Attention is guided even more quickly to unexpected locations, when these locations are made salient by external visual inputs (such as a sudden flash). According to the V1 Saliency Hypothesis, the human primary visual cortex plays a critical role for such an exogenous attentional guidance.[5]

Spatial attention is distinctive from other forms of visual attention such as object-based attention and feature-based attention.[6] These other forms of visual attention select an entire object or a specific feature of an object regardless of its location, whereas spatial attention selects a specific region of space and the objects and features within that region are processed.

  1. ^ "NIPS 2017". Interpretable ML Symposium. 2017-10-20. Archived from the original on 2019-09-07. Retrieved 2018-09-12.
  2. ^ Zang, Jinliang; Wang, Le; Liu, Ziyi; Zhang, Qilin; Hua, Gang; Zheng, Nanning (2018). "Attention-Based Temporal Weighted Convolutional Neural Network for Action Recognition". IFIP Advances in Information and Communication Technology. Cham: Springer International Publishing. pp. 97–108. arXiv:1803.07179. doi:10.1007/978-3-319-92007-8_9. ISBN 978-3-319-92006-1. ISSN 1868-4238. S2CID 4058889.
  3. ^ Wang, Le; Zang, Jinliang; Zhang, Qilin; Niu, Zhenxing; Hua, Gang; Zheng, Nanning (2018-06-21). "Action Recognition by an Attention-Aware Temporal Weighted Convolutional Neural Network" (PDF). Sensors. 18 (7): 1979. Bibcode:2018Senso..18.1979W. doi:10.3390/s18071979. ISSN 1424-8220. PMC 6069475. PMID 29933555.
  4. ^ Cite error: The named reference Posner was invoked but never defined (see the help page).
  5. ^ Li. Z. 2002 A saliency map in primary visual cortex Trends in Cognitive Sciences vol. 6, Pages 9-16, and Zhaoping, L. 2014, The V1 hypothesis—creating a bottom-up saliency map for preattentive selection and segmentation in the book Understanding Vision: Theory, Models, and Data
  6. ^ Tootell, R. B., Hadjikhani, N., Hall, E. K., Marrett, S., Vanduffel, W., Vaughan, J. T., & Dale, A. M. (1998). "The retinotopy of visual spatial attention" (PDF). Neuron. 21 (6): 1409–1422. doi:10.1016/S0896-6273(00)80659-5. PMID 9883733. S2CID 6336492.{{cite journal}}: CS1 maint: multiple names: authors list (link)