1 code implementation • 8 Apr 2022 • Rong Ye, Chengqi Zhao, Tom Ko, Chutong Meng, Tao Wang, Mingxuan Wang, Jun Cao
The training set is translated by a strong machine translation system and the test set is translated by human.
1 code implementation • 31 Aug 2023 • Zhichao Huang, Chutong Meng, Tom Ko
To improve the performance of these discrete speech tokens, we present RepCodec, a novel speech representation codec for semantic speech tokenization.
3 code implementations • 30 Mar 2023 • Xinhao Mei, Chutong Meng, Haohe Liu, Qiuqiang Kong, Tom Ko, Chengqi Zhao, Mark D. Plumbley, Yuexian Zou, Wenwu Wang
To address this data scarcity issue, we introduce WavCaps, the first large-scale weakly-labelled audio captioning dataset, comprising approximately 400k audio clips with paired captions.
Ranked #1 on Zero-Shot Environment Sound Classification on ESC-50 (using extra training data)