Search Results for author: Yan-Bo Lin

Found 10 papers, 6 papers with code

Siamese Vision Transformers are Scalable Audio-visual Learners

1 code implementation • 28 Mar 2024 • Yan-Bo Lin, Gedas Bertasius

Our framework uses a single shared vision transformer backbone to process audio and visual inputs, improving its parameter efficiency, reducing the GPU memory footprint, and allowing us to scale our method to larger datasets and model sizes.

Contrastive Learning Retrieval

4

Paper
Code

DAM: Dynamic Adapter Merging for Continual Video QA Learning

1 code implementation • 13 Mar 2024 • Feng Cheng, Ziyang Wang, Yi-Lin Sung, Yan-Bo Lin, Mohit Bansal, Gedas Bertasius

Our DAM model outperforms prior state-of-the-art continual learning approaches by 9. 1% while exhibiting 1. 9% less forgetting on 6 VidQA datasets spanning various domains.

Continual Learning Image Classification +2

8

Paper
Code

Vision Transformers are Parameter-Efficient Audio-Visual Learners

1 code implementation • CVPR 2023 • Yan-Bo Lin, Yi-Lin Sung, Jie Lei, Mohit Bansal, Gedas Bertasius

To do so, we propose a latent audio-visual hybrid (LAVISH) adapter that adapts pretrained ViTs to audio-visual tasks by injecting a small number of trainable parameters into every layer of a frozen ViT.

Ranked #4 on Audio-visual Question Answering on MUSIC-AVQA

Audio-visual Question Answering

76

Paper
Code

ECLIPSE: Efficient Long-range Video Retrieval using Sight and Sound

1 code implementation • 6 Apr 2022 • Yan-Bo Lin, Jie Lei, Mohit Bansal, Gedas Bertasius

We introduce an audiovisual method for long-range text-to-video retrieval.

Retrieval Text to Video Retrieval +1

31

Paper
Code

Exploring Cross-Video and Cross-Modality Signals for Weakly-Supervised Audio-Visual Video Parsing

1 code implementation • NeurIPS 2021 • Yan-Bo Lin, Hung-Yu Tseng, Hsin-Ying Lee, Yen-Yu Lin, Ming-Hsuan Yang

The audio-visual video parsing task aims to temporally parse a video into audio or visual event categories.

4

Paper
Code

Exploiting Audio-Visual Consistency with Partial Supervision for Spatial Audio Generation

no code implementations • 3 May 2021 • Yan-Bo Lin, Yu-Chiang Frank Wang

Human perceives rich auditory experience with distinct sound heard by ears.

Audio Generation Self-Supervised Learning

Paper
Add Code

Unsupervised Sound Localization via Iterative Contrastive Learning

no code implementations • 1 Apr 2021 • Yan-Bo Lin, Hung-Yu Tseng, Hsin-Ying Lee, Yen-Yu Lin, Ming-Hsuan Yang

Sound localization aims to find the source of the audio signal in the visual scene.

Contrastive Learning

Paper
Add Code

Cross-Dataset Person Re-Identification via Unsupervised Pose Disentanglement and Adaptation

no code implementations • ICCV 2019 • Yu-Jhe Li, Ci-Siang Lin, Yan-Bo Lin, Yu-Chiang Frank Wang

Person re-identification (re-ID) aims at recognizing the same person from images taken across different cameras.

Ranked #16 on Unsupervised Domain Adaptation on Market to Duke

Disentanglement Person Re-Identification +1

Paper
Add Code

Dual-modality seq2seq network for audio-visual event localization

2 code implementations • 20 Feb 2019 • Yan-Bo Lin, Yu-Jhe Li, Yu-Chiang Frank Wang

Audio-visual event localization requires one to identify theevent which is both visible and audible in a video (eitherat a frame or video level).

audio-visual event localization

158

Paper
Code

結合β距離與圖形正規限制式之非負矩陣分解應用於單通道訊號源分離(Monaural Source Separation Using Nonnegative Matrix Factorization with Graph Regularization Constraint) [In Chinese]

no code implementations • ROCLINGIJCLCLP 2015 • Yan-Bo Lin, Pham Tuan, Yuan-Shan Lee, Jia-Ching Wang

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.