Search Results for author: Yan-Bo Lin

Found 10 papers, 6 papers with code

Siamese Vision Transformers are Scalable Audio-visual Learners

1 code implementation28 Mar 2024 Yan-Bo Lin, Gedas Bertasius

Our framework uses a single shared vision transformer backbone to process audio and visual inputs, improving its parameter efficiency, reducing the GPU memory footprint, and allowing us to scale our method to larger datasets and model sizes.

Contrastive Learning Retrieval

DAM: Dynamic Adapter Merging for Continual Video QA Learning

1 code implementation13 Mar 2024 Feng Cheng, Ziyang Wang, Yi-Lin Sung, Yan-Bo Lin, Mohit Bansal, Gedas Bertasius

Our DAM model outperforms prior state-of-the-art continual learning approaches by 9. 1% while exhibiting 1. 9% less forgetting on 6 VidQA datasets spanning various domains.

Continual Learning Image Classification +2

Vision Transformers are Parameter-Efficient Audio-Visual Learners

1 code implementation CVPR 2023 Yan-Bo Lin, Yi-Lin Sung, Jie Lei, Mohit Bansal, Gedas Bertasius

To do so, we propose a latent audio-visual hybrid (LAVISH) adapter that adapts pretrained ViTs to audio-visual tasks by injecting a small number of trainable parameters into every layer of a frozen ViT.

Audio-visual Question Answering

Dual-modality seq2seq network for audio-visual event localization

2 code implementations20 Feb 2019 Yan-Bo Lin, Yu-Jhe Li, Yu-Chiang Frank Wang

Audio-visual event localization requires one to identify theevent which is both visible and audible in a video (eitherat a frame or video level).

audio-visual event localization

Cannot find the paper you are looking for? You can Submit a new open access paper.