Audio to Text Retrieval

5 papers with code • 4 benchmarks • 4 datasets

This task has no description! Would you like to contribute one?

Benchmarks

Add a Result

These leaderboards are used to track progress in Audio to Text Retrieval

Dataset	Best Model	Compare
Clotho	ONE-PEACE	See all
AudioCaps	ONE-PEACE	See all
SoundDescs	MMT	See all
Localized Narratives	OPT	See all

Datasets

Latest papers

Most implemented Social Latest No code

ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities

modelscope/modelscope • • 18 May 2023

In this work, we explore a scalable way for building a general representation model toward unlimited modalities.

6,005

18 May 2023

Paper
Code

Contrastive Audio-Language Learning for Music

ilaria-manco/muscall • • 25 Aug 2022

In this work, we explore cross-modal learning in an attempt to bridge audio and language in the music domain.

25 Aug 2022

Paper
Code

Audio Retrieval with Natural Language Queries: A Benchmark Study

akoepke/audio-retrieval-benchmark • • 17 Dec 2021

Additionally, we introduce the SoundDescs benchmark, which consists of paired audio and natural language descriptions for a diverse collection of sounds that are complementary to those found in AudioCaps and Clotho.

17 Dec 2021

Paper
Code

OPT: Omni-Perception Pre-Trainer for Cross-Modal Understanding and Generation

mindspore-ai/models • • 1 Jul 2021

In this paper, we propose an Omni-perception Pre-Trainer (OPT) for cross-modal understanding and generation, by jointly modeling visual, text and audio resources.

334

01 Jul 2021

Paper
Code

Audio Retrieval with Natural Language Queries

oncescuandreea/audio-retrieval • • 5 May 2021

We consider the task of retrieving audio using free-form natural language queries.

05 May 2021

Paper
Code

Audio to Text Retrieval

Benchmarks Add a Result

Datasets

Latest papers

ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities

Contrastive Audio-Language Learning for Music

Audio Retrieval with Natural Language Queries: A Benchmark Study

OPT: Omni-Perception Pre-Trainer for Cross-Modal Understanding and Generation

Audio Retrieval with Natural Language Queries

Content

Benchmarks

Add a Result