Audio to Text Retrieval

5 papers with code • 4 benchmarks • 4 datasets

This task has no description! Would you like to contribute one?

ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities

modelscope/modelscope 18 May 2023

In this work, we explore a scalable way for building a general representation model toward unlimited modalities.

6,005
18 May 2023

Contrastive Audio-Language Learning for Music

ilaria-manco/muscall 25 Aug 2022

In this work, we explore cross-modal learning in an attempt to bridge audio and language in the music domain.

97
25 Aug 2022

Audio Retrieval with Natural Language Queries: A Benchmark Study

akoepke/audio-retrieval-benchmark 17 Dec 2021

Additionally, we introduce the SoundDescs benchmark, which consists of paired audio and natural language descriptions for a diverse collection of sounds that are complementary to those found in AudioCaps and Clotho.

36
17 Dec 2021

OPT: Omni-Perception Pre-Trainer for Cross-Modal Understanding and Generation

mindspore-ai/models 1 Jul 2021

In this paper, we propose an Omni-perception Pre-Trainer (OPT) for cross-modal understanding and generation, by jointly modeling visual, text and audio resources.

334
01 Jul 2021

Audio Retrieval with Natural Language Queries

oncescuandreea/audio-retrieval 5 May 2021

We consider the task of retrieving audio using free-form natural language queries.

28
05 May 2021