no code implementations • 17 May 2021 • Kamala Gajurel, Cuncong Zhong, Guanghui Wang
The fine-grained attention is achieved by utilizing the change in motion of the video frames (optical flow) in sequential context-based attention along with a Transformer encoder model.