1 code implementation • 19 Jun 2024 • Jizhong Liu, Gang Li, Junbo Zhang, Heinrich Dinkel, Yongqing Wang, Zhiyong Yan, Yujun Wang, Bin Wang
Automated audio captioning (AAC) is an audio-to-text task to describe audio contents in natural language.
Ranked #2 on Audio captioning on Clotho (using extra training data)
1 code implementation • 11 Jun 2024 • Zhiyong Yan, Heinrich Dinkel, Yongqing Wang, Jizhong Liu, Junbo Zhang, Yujun Wang, Bin Wang
The predominant focus of existing research on English descriptions poses a limitation on the applicability of such models, given the abundance of non-English content in real-world data.
1 code implementation • 19 Jan 2021 • Heinrich Dinkel, Mengyue Wu, Kai Yu
Our model outperforms other approaches on the DCASE2018 and URBAN-SED datasets without requiring prior duration knowledge.
Data Augmentation Sound Event Detection Sound Audio and Speech Processing
1 code implementation • ECCV 2020 • Rui Qian, Di Hu, Heinrich Dinkel, Mengyue Wu, Ning Xu, Weiyao Lin
How to visually localize multiple sound sources in unconstrained videos is a formidable problem, especially when lack of the pairwise sound-object annotations.
1 code implementation • 27 Mar 2020 • Heinrich Dinkel, Yefei Chen, Mengyue Wu, Kai Yu
We proposed two GPVAD models, one full (GPV-F), trained on 527 Audioset sound events, and one binary (GPV-B), only distinguishing speech and noise.
Sound Audio and Speech Processing
1 code implementation • 31 May 2019 • Xuenan Xu, Heinrich Dinkel, Mengyue Wu, Kai Yu
Captioning has attracted much attention in image and video understanding while a small amount of work examines audio captioning.
1 code implementation • 8 Apr 2019 • Heinrich Dinkel, Kai Yu
Task 4 of the Dcase2018 challenge demonstrated that substantially more research is needed for a real-world application of sound event detection.
Sound Audio and Speech Processing
1 code implementation • 8 Apr 2019 • Heinrich Dinkel, Mengyue Wu, Kai Yu
Previous text-based depression detection is commonly based on large user-generated data.
1 code implementation • 25 Feb 2019 • Mengyue Wu, Heinrich Dinkel, Kai Yu
A baseline encoder-decoder model is provided for both English and Mandarin.