Search Results for author: Yake Wei

Found 8 papers, 6 papers with code

MMPareto: Boosting Multimodal Learning with Innocent Unimodal Assistance

1 code implementation28 May 2024 Yake Wei, Di Hu

However, in this paper, we identify the previously ignored gradient conflict between multimodal and unimodal learning objectives, potentially misleading the unimodal encoder optimization.

Multimodal Fusion on Low-quality Data: A Comprehensive Survey

no code implementations27 Apr 2024 Qingyang Zhang, Yake Wei, Zongbo Han, Huazhu Fu, Xi Peng, Cheng Deng, QinGhua Hu, Cai Xu, Jie Wen, Di Hu, Changqing Zhang

Multimodal fusion focuses on integrating information from multiple modalities with the goal of more accurate prediction, which has achieved remarkable progress in a wide range of scenarios, including autonomous driving and medical diagnosis.

Autonomous Driving Medical Diagnosis

Quantifying and Enhancing Multi-modal Robustness with Modality Preference

1 code implementation9 Feb 2024 Zequn Yang, Yake Wei, Ce Liang, Di Hu

Moreover, our analysis reveals how the widespread issue, that the model has different preferences for modalities, limits the multi-modal robustness by influencing the essential components and could lead to attacks on the specific modality highly effective.

Enhancing multimodal cooperation via sample-level modality valuation

1 code implementation CVPR 2024 Yake Wei, Ruoxuan Feng, Zihe Wang, Di Hu

One primary topic of multimodal learning is to jointly incorporate heterogeneous information from different modalities.

Learning in Audio-visual Context: A Review, Analysis, and New Perspective

no code implementations20 Aug 2022 Yake Wei, Di Hu, Yapeng Tian, Xuelong Li

A comprehensive survey that can systematically organize and analyze studies of the audio-visual field is expected.

audio-visual learning Scene Understanding

Balanced Multimodal Learning via On-the-fly Gradient Modulation

1 code implementation CVPR 2022 Xiaokang Peng, Yake Wei, Andong Deng, Dong Wang, Di Hu

Multimodal learning helps to comprehensively understand the world, by integrating different senses.

Learning to Answer Questions in Dynamic Audio-Visual Scenarios

1 code implementation CVPR 2022 Guangyao Li, Yake Wei, Yapeng Tian, Chenliang Xu, Ji-Rong Wen, Di Hu

In this paper, we focus on the Audio-Visual Question Answering (AVQA) task, which aims to answer questions regarding different visual objects, sounds, and their associations in videos.

audio-visual learning Audio-visual Question Answering +4

Class-aware Sounding Objects Localization via Audiovisual Correspondence

1 code implementation22 Dec 2021 Di Hu, Yake Wei, Rui Qian, Weiyao Lin, Ruihua Song, Ji-Rong Wen

To address this problem, we propose a two-stage step-by-step learning framework to localize and recognize sounding objects in complex audiovisual scenarios using only the correspondence between audio and vision.

Object object-detection +3

Cannot find the paper you are looking for? You can Submit a new open access paper.