no code implementations • Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, 2023 2023 • Mobeen Ahmad, Geonwoo Park, Dongchan Park, Sanguk Park
To address this, we propose a novel vision-text fusion module that learns the temporal context of the video and question.
Ranked #8 on Video Question Answering on AGQA 2.0 balanced (Average Accuracy metric)
1 code implementation • CVPR 2023 • WonJun Moon, Sangeek Hyun, Sanguk Park, Dongchan Park, Jae-Pil Heo
As we observe the insignificant role of a given query in transformer architectures, our encoding module starts with cross-attention layers to explicitly inject the context of text query into video representation.
Ranked #2 on Highlight Detection on TvSum
1 code implementation • 29 Jun 2022 • Hyeonyu Kim, Jongeun Kim, Jeonghun Kang, Sanguk Park, Dongchan Park, Taehwan Kim
This technical report presents the 2nd winning model for AQTC, a task newly introduced in CVPR 2022 LOng-form VidEo Understanding (LOVEU) challenges.