Audio captioning

15 papers with code • 2 benchmarks • 2 datasets

Clotho: An Audio Captioning Dataset

richermans/AudioCaption 21 Oct 2019

Audio captioning is the novel task of general audio content description using free text.

CL4AC: A Contrastive Loss for Audio Captioning

liuxubo717/cl4ac 21 Jul 2021

Automated Audio captioning (AAC) is a cross-modal translation task that aims to use natural language to describe the content of an audio clip.

Audio Caption in a Car Setting with a Sentence-Level Loss

richermans/AudioCaption 31 May 2019

Captioning has attracted much attention in image and video understanding while a small amount of work examines audio captioning.

Temporal Sub-sampling of Audio Feature Sequences for Automated Audio Captioning

DK-Nguyen/audio-captioning-sub-sampling 6 Jul 2020

In this work we present an approach that focuses on explicitly taking advantage of this difference of lengths between sequences, by applying a temporal sub-sampling to the audio input sequence.

Multi-task Regularization Based on Infrequent Classes for Audio Captioning

emrcak/dcase-2020-baseline 9 Jul 2020

Audio captioning is a multi-modal task, focusing on using natural language for describing the contents of general audio.

WaveTransformer: A Novel Architecture for Audio Captioning Based on Learning Temporal and Time-Frequency Information

haantran96/wavetransformer 21 Oct 2020

Automated audio captioning (AAC) is a novel task, where a method takes as an input an audio sample and outputs a textual description (i. e. a caption) of its contents.

MusCaps: Generating Captions for Music Audio

ilaria-manco/muscaps 24 Apr 2021

Content-based music information retrieval has seen rapid progress with the adoption of deep learning.


wsntxxn/AudioCaption DCASE Challenge 2021

This report proposes an audio captioning system for the Detection and Classification of Acoustic Scenes and Events (DCASE) 2021 challenge task Task 6.

Continual Learning for Automated Audio Captioning Using The Learning Without Forgetting Approach

JanBerg1/AAC-LwF 16 Jul 2021

In our scenario, a pre-optimized AAC method is used for some unseen general audio signals and can update its parameters in order to adapt to the new information, given a new reference caption.

Audio Captioning Transformer

XinhaoMei/ACT 21 Jul 2021

In this paper, we propose an Audio Captioning Transformer (ACT), which is a full Transformer network based on an encoder-decoder architecture and is totally convolution-free.