Audio to Text Retrieval
5 papers with code • 4 benchmarks • 4 datasets
Latest papers
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
In this work, we explore a scalable way for building a general representation model toward unlimited modalities.
Contrastive Audio-Language Learning for Music
In this work, we explore cross-modal learning in an attempt to bridge audio and language in the music domain.
Audio Retrieval with Natural Language Queries: A Benchmark Study
Additionally, we introduce the SoundDescs benchmark, which consists of paired audio and natural language descriptions for a diverse collection of sounds that are complementary to those found in AudioCaps and Clotho.
OPT: Omni-Perception Pre-Trainer for Cross-Modal Understanding and Generation
In this paper, we propose an Omni-perception Pre-Trainer (OPT) for cross-modal understanding and generation, by jointly modeling visual, text and audio resources.
Audio Retrieval with Natural Language Queries
We consider the task of retrieving audio using free-form natural language queries.