1 code implementation • 13 Mar 2024 • Feng Cheng, Ziyang Wang, Yi-Lin Sung, Yan-Bo Lin, Mohit Bansal, Gedas Bertasius
Our DAM model outperforms prior state-of-the-art continual learning approaches by 9. 1% while exhibiting 1. 9% less forgetting on 6 VidQA datasets spanning various domains.
1 code implementation • CVPR 2023 • Yan-Bo Lin, Yi-Lin Sung, Jie Lei, Mohit Bansal, Gedas Bertasius
To do so, we propose a latent audio-visual hybrid (LAVISH) adapter that adapts pretrained ViTs to audio-visual tasks by injecting a small number of trainable parameters into every layer of a frozen ViT.
Ranked #4 on Audio-visual Question Answering on MUSIC-AVQA
1 code implementation • 6 Apr 2022 • Yan-Bo Lin, Jie Lei, Mohit Bansal, Gedas Bertasius
We introduce an audiovisual method for long-range text-to-video retrieval.
1 code implementation • NeurIPS 2021 • Yan-Bo Lin, Hung-Yu Tseng, Hsin-Ying Lee, Yen-Yu Lin, Ming-Hsuan Yang
The audio-visual video parsing task aims to temporally parse a video into audio or visual event categories.
no code implementations • 3 May 2021 • Yan-Bo Lin, Yu-Chiang Frank Wang
Human perceives rich auditory experience with distinct sound heard by ears.
no code implementations • 1 Apr 2021 • Yan-Bo Lin, Hung-Yu Tseng, Hsin-Ying Lee, Yen-Yu Lin, Ming-Hsuan Yang
Sound localization aims to find the source of the audio signal in the visual scene.
no code implementations • ICCV 2019 • Yu-Jhe Li, Ci-Siang Lin, Yan-Bo Lin, Yu-Chiang Frank Wang
Person re-identification (re-ID) aims at recognizing the same person from images taken across different cameras.
Ranked #15 on Unsupervised Domain Adaptation on Market to Duke
2 code implementations • 20 Feb 2019 • Yan-Bo Lin, Yu-Jhe Li, Yu-Chiang Frank Wang
Audio-visual event localization requires one to identify theevent which is both visible and audible in a video (eitherat a frame or video level).