1 code implementation • CVPR 2024 • Joonmyung Choi, Sanghyeok Lee, Jaewon Chu, Minhyuk Choi, Hyunwoo J. Kim
To tackle these issues, we propose training free token merging for lightweight video Transformer (vid-TLDR) that aims to enhance the efficiency of video Transformers by merging the background tokens without additional training.
Ranked #2 on Video Retrieval on SSv2-template retrieval (using extra training data)
1 code implementation • NeurIPS 2023 • Hyeong Kyu Choi, Seunghun Lee, Jaewon Chu, Hyunwoo J. Kim
Multi-hop Knowledge Graph Question Answering (KGQA) is a task that involves retrieving nodes from a knowledge graph (KG) to answer natural language questions.
1 code implementation • ICCV 2023 • Dohwan Ko, Ji Soo Lee, Miso Choi, Jaewon Chu, Jihwan Park, Hyunwoo J. Kim
We hence propose a new benchmark, Open-vocabulary Video Question Answering (OVQA), to measure the generalizability of VideoQA models by considering rare and unseen answers.
Ranked #8 on Visual Question Answering (VQA) on MSRVTT-QA