no code implementations • 20 Sep 2022 • Huang Xie, Samuel Lipping, Tuomas Virtanen
Language-based audio retrieval is a task, where natural language textual captions are used as queries to retrieve audio signals from a dataset.
1 code implementation • 13 Jun 2022 • Huang Xie, Samuel Lipping, Tuomas Virtanen
Language-based audio retrieval is a task, where natural language textual captions are used as queries to retrieve audio signals from a dataset.
no code implementations • 20 Apr 2022 • Samuel Lipping, Parthasaarathy Sudarsanam, Konstantinos Drossos, Tuomas Virtanen
Audio question answering (AQA) is a multimodal translation task where a system analyzes an audio signal and a natural language question, to generate a desirable natural language answer.
7 code implementations • 21 Oct 2019 • Konstantinos Drossos, Samuel Lipping, Tuomas Virtanen
Audio captioning is the novel task of general audio content description using free text.
1 code implementation • 22 Jul 2019 • Samuel Lipping, Konstantinos Drossos, Tuomas Virtanen
In this paper we present a three steps based framework for crowdsourcing an audio captioning dataset, based on concepts and practises followed for the creation of widely used image captioning and machine translations datasets.
Sound Audio and Speech Processing