Visual temporal attention is a special case of visual attention that involves directing attention to specific instant of time. Similar to its spatial counterpart visual spatial attention, these attention modules have been widely implemented in video analytics in computer vision to provide enhanced performance and human interpretable explanation[3] of deep learning models.
As visual spatial attention mechanism allows human and/or computer vision systems to focus more on semantically more substantial regions in space, visual temporal attention modules enable machine learning algorithms to emphasize more on critical video frames in video analytics tasks, such as human action recognition. In convolutional neural network-based systems, the prioritization introduced by the attention mechanism is regularly implemented as a linear weighting layer with parameters determined by labeled training data.[3]
Center 2013
was invoked but never defined (see the help page).Zang Wang Liu Zhang 2018 pp. 97–108
was invoked but never defined (see the help page).Interpretable ML Symposium 2017
was invoked but never defined (see the help page).