Visual temporal attention

Visual temporal attention is a special case of visual attention that involves directing attention to specific instant of time. Similar to its spatial counterpart visual spatial attention, these attention modules have been widely implemented in video analytics in computer vision to provide enhanced performance and human interpretable explanation^[3] of deep learning models.

As visual spatial attention mechanism allows human and/or computer vision systems to focus more on semantically more substantial regions in space, visual temporal attention modules enable machine learning algorithms to emphasize more on critical video frames in video analytics tasks, such as human action recognition. In convolutional neural network-based systems, the prioritization introduced by the attention mechanism is regularly implemented as a linear weighting layer with parameters determined by labeled training data.^[3]

^ Cite error: The named reference Center 2013 was invoked but never defined (see the help page).
^ Cite error: The named reference Zang Wang Liu Zhang 2018 pp. 97–108 was invoked but never defined (see the help page).
^ ^a ^b Cite error: The named reference Interpretable ML Symposium 2017 was invoked but never defined (see the help page).

[Center_2013-1] Cite error: The named reference Center 2013 was invoked but never defined (see the help page).

[Zang_Wang_Liu_Zhang_2018_pp._97–108-2] Cite error: The named reference Zang Wang Liu Zhang 2018 pp. 97–108 was invoked but never defined (see the help page).

[Interpretable_ML_Symposium_2017-3] Cite error: The named reference Interpretable ML Symposium 2017 was invoked but never defined (see the help page).

[1]

[2]

[3]