no code implementations • 20 Sep 2023 • Luoyi Sun, Xuenan Xu, Mengyue Wu, Weidi Xie
To tackle these challenges, we present an innovative and automatic audio caption generation pipeline based on a series of public tools or APIs, and construct a large-scale, high-quality, audio-language dataset, named as Auto-ACD, comprising over 1. 9M audio-text pairs.
no code implementations • 16 Jun 2023 • Hanxue Zhang, Zeyu Xie, Xuenan Xu, Mengyue Wu, Kai Yu
Automated audio captioning (AAC) is an important cross-modality translation task, aiming at generating descriptions for audio clips.
no code implementations • 25 Mar 2022 • Siyu Lou, Xuenan Xu, Mengyue Wu, Kai Yu
Using pre-trained audio features and a descriptor-based aggregation method, we build our contextual audio-text retrieval system.
1 code implementation • 10 Oct 2021 • Zelin Zhou, Zhiling Zhang, Xuenan Xu, Zeyu Xie, Mengyue Wu, Kenny Q. Zhu
Current metrics are found in poor correlation with human annotations on these datasets.
1 code implementation • DCASE Challenge 2021 • Xuenan Xu, Zeyu Xie, Mengyue Wu, Kai Yu
This report proposes an audio captioning system for the Detection and Classification of Acoustic Scenes and Events (DCASE) 2021 challenge task Task 6.
Ranked #2 on Audio captioning on Clotho (using extra training data)
1 code implementation • 31 May 2019 • Xuenan Xu, Heinrich Dinkel, Mengyue Wu, Kai Yu
Captioning has attracted much attention in image and video understanding while a small amount of work examines audio captioning.