audio-visual learning

16 papers with code • 0 benchmarks • 3 datasets

This task has no description! Would you like to contribute one?

Most implemented papers

Adversarial-Metric Learning for Audio-Visual Cross-Modal Matching

my-yy/AML_Copy IEEE Transactions on Multimedia 2021

AML aims to generate a modality-independent representation for each person in each modality via adversarial learning, while simultaneously learns a robust similarity measure for cross-modality matching via metric learning.

Can audio-visual integration strengthen robustness under multimodal attacks?

YapengTian/AV-Robustness-CVPR21 CVPR 2021

In this paper, we propose to make a systematic study on machines multisensory perception under attacks.

Distilling Audio-Visual Knowledge by Compositional Contrastive Learning

yanbeic/CCL CVPR 2021

Having access to multi-modal cues (e. g. vision and audio) empowers some cognitive tasks to be done faster compared to learning from a single modality.

Cascaded Multilingual Audio-Visual Learning from Videos

roudimit/AVLnet 8 Nov 2021

In this paper, we explore self-supervised audio-visual models that learn from instructional videos.

Learning to Answer Questions in Dynamic Audio-Visual Scenarios

GeWu-Lab/MUSIC-AVQA CVPR 2022

In this paper, we focus on the Audio-Visual Question Answering (AVQA) task, which aims to answer questions regarding different visual objects, sounds, and their associations in videos.

Modality-Aware Contrastive Instance Learning with Self-Distillation for Weakly-Supervised Audio-Visual Violence Detection

JustinYuu/MACIL_SD 12 Jul 2022

In this paper, we analyze the modality asynchrony and undifferentiated instances phenomena of the multiple instance learning (MIL) procedure, and further investigate its negative impact on weakly-supervised audio-visual learning.

UAVM: Towards Unifying Audio and Visual Models

YuanGongND/uavm 29 Jul 2022

Conventional audio-visual models have independent audio and video branches.

Revisiting Pre-training in Audio-Visual Learning

gewu-lab/revisiting-pre-training-in-audio-visual-learning 7 Feb 2023

Specifically, we explore the effects of pre-trained models on two audio-visual learning scenarios: cross-modal initialization and multi-modal joint learning.

Unraveling Instance Associations: A Closer Look for Audio-Visual Segmentation

cyh-0/CAVP 6 Apr 2023

We show empirical results that demonstrate the effectiveness of our benchmark.

AudioToken: Adaptation of Text-Conditioned Diffusion Models for Audio-to-Image Generation

guyyariv/AudioToken Interspeech 2023

In this paper, we propose a novel method utilizing latent diffusion models trained for text-to-image-generation to generate images conditioned on audio recordings.