no code implementations • Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, 2023 2023 • Mobeen Ahmad, Geonwoo Park, Dongchan Park, Sanguk Park
To address this, we propose a novel vision-text fusion module that learns the temporal context of the video and question.
Ranked #8 on Video Question Answering on AGQA 2.0 balanced (Average Accuracy metric)
no code implementations • 28 Oct 2020 • Hochul Hwang, Cheongjae Jang, Geonwoo Park, Junghyun Cho, Ig-Jae Kim
We then generate KIST SynADL, a large-scale synthetic dataset of elders' activities of daily living, from ElderSim and use the data in addition to real datasets to train three state-of the-art human action recognition models.