Search Results for author: Yake Wei

Found 6 papers, 5 papers with code

Quantifying and Enhancing Multi-modal Robustness with Modality Preference

1 code implementation9 Feb 2024 Zequn Yang, Yake Wei, Ce Liang, Di Hu

Moreover, our analysis reveals how the widespread issue, that the model has different preferences for modalities, limits the multi-modal robustness by influencing the essential components and could lead to attacks on the specific modality highly effective.

Enhancing Multimodal Cooperation via Fine-grained Modality Valuation

1 code implementation12 Sep 2023 Yake Wei, Ruoxuan Feng, Zihe Wang, Di Hu

One primary topic of multimodal learning is to jointly incorporate heterogeneous information from different modalities.

Learning in Audio-visual Context: A Review, Analysis, and New Perspective

no code implementations20 Aug 2022 Yake Wei, Di Hu, Yapeng Tian, Xuelong Li

A comprehensive survey that can systematically organize and analyze studies of the audio-visual field is expected.

audio-visual learning Scene Understanding

Balanced Multimodal Learning via On-the-fly Gradient Modulation

1 code implementation CVPR 2022 Xiaokang Peng, Yake Wei, Andong Deng, Dong Wang, Di Hu

Multimodal learning helps to comprehensively understand the world, by integrating different senses.

Learning to Answer Questions in Dynamic Audio-Visual Scenarios

1 code implementation CVPR 2022 Guangyao Li, Yake Wei, Yapeng Tian, Chenliang Xu, Ji-Rong Wen, Di Hu

In this paper, we focus on the Audio-Visual Question Answering (AVQA) task, which aims to answer questions regarding different visual objects, sounds, and their associations in videos.

audio-visual learning Audio-visual Question Answering +4

Class-aware Sounding Objects Localization via Audiovisual Correspondence

1 code implementation22 Dec 2021 Di Hu, Yake Wei, Rui Qian, Weiyao Lin, Ruihua Song, Ji-Rong Wen

To address this problem, we propose a two-stage step-by-step learning framework to localize and recognize sounding objects in complex audiovisual scenarios using only the correspondence between audio and vision.

Object object-detection +3

Cannot find the paper you are looking for? You can Submit a new open access paper.