Search Results for author: Yake Wei

Found 6 papers, 5 papers with code

Quantifying and Enhancing Multi-modal Robustness with Modality Preference

1 code implementation • 9 Feb 2024 • Zequn Yang, Yake Wei, Ce Liang, Di Hu

Moreover, our analysis reveals how the widespread issue, that the model has different preferences for modalities, limits the multi-modal robustness by influencing the essential components and could lead to attacks on the specific modality highly effective.

Paper
Code

Enhancing Multimodal Cooperation via Fine-grained Modality Valuation

1 code implementation • 12 Sep 2023 • Yake Wei, Ruoxuan Feng, Zihe Wang, Di Hu

One primary topic of multimodal learning is to jointly incorporate heterogeneous information from different modalities.

Paper
Code

Learning in Audio-visual Context: A Review, Analysis, and New Perspective

no code implementations • 20 Aug 2022 • Yake Wei, Di Hu, Yapeng Tian, Xuelong Li

A comprehensive survey that can systematically organize and analyze studies of the audio-visual field is expected.

audio-visual learning Scene Understanding

Paper
Add Code

Balanced Multimodal Learning via On-the-fly Gradient Modulation

1 code implementation • CVPR 2022 • Xiaokang Peng, Yake Wei, Andong Deng, Dong Wang, Di Hu

Multimodal learning helps to comprehensively understand the world, by integrating different senses.

197

Paper
Code

Learning to Answer Questions in Dynamic Audio-Visual Scenarios

1 code implementation • CVPR 2022 • Guangyao Li, Yake Wei, Yapeng Tian, Chenliang Xu, Ji-Rong Wen, Di Hu

In this paper, we focus on the Audio-Visual Question Answering (AVQA) task, which aims to answer questions regarding different visual objects, sounds, and their associations in videos.

Ranked #5 on Audio-visual Question Answering on MUSIC-AVQA

audio-visual learning Audio-visual Question Answering +4

Paper
Code

Class-aware Sounding Objects Localization via Audiovisual Correspondence

1 code implementation • 22 Dec 2021 • Di Hu, Yake Wei, Rui Qian, Weiyao Lin, Ruihua Song, Ji-Rong Wen

To address this problem, we propose a two-stage step-by-step learning framework to localize and recognize sounding objects in complex audiovisual scenarios using only the correspondence between audio and vision.

Object object-detection +3

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.