no code implementations • 21 Sep 2023 • Jieyi Huang, Chunhao Zhang, YuFei Wang, Mengyue Wu, Kenny Zhu
How hosts language influence their pets' vocalization is an interesting yet underexplored problem.
no code implementations • 21 Sep 2023 • YuFei Wang, Chunhao Zhang, Jieyi Huang, Mengyue Wu, Kenny Zhu
This study presents a data-driven investigation into the semantics of dog vocalizations via correlating different sound types with consistent semantics.
no code implementations • 20 Sep 2023 • Luoyi Sun, Xuenan Xu, Mengyue Wu, Weidi Xie
To tackle these challenges, we present an innovative and automatic audio caption generation pipeline based on a series of public tools or APIs, and construct a large-scale, high-quality, audio-language dataset, named as Auto-ACD, comprising over 1. 9M audio-text pairs.
no code implementations • 16 Jun 2023 • Hanxue Zhang, Zeyu Xie, Xuenan Xu, Mengyue Wu, Kai Yu
Automated audio captioning (AAC) is an important cross-modality translation task, aiming at generating descriptions for audio clips.
no code implementations • 23 May 2023 • Siyuan Chen, Mengyue Wu, Kenny Q. Zhu, Kunyao Lan, Zhiling Zhang, Lyuchun Cui
Empowering chatbots in the field of mental health is receiving increasing amount of attention, while there still lacks exploration in developing and evaluating chatbots in psychiatric outpatient scenarios.
1 code implementation • 4 May 2023 • Zhiling Zhang, Mengyue Wu, Kenny Q. Zhu
Controlling chatbot utterance generation with multiple attributes such as personalities, emotions and dialogue acts is a practically useful but under-studied problem.
no code implementations • 10 Sep 2022 • Zhi Chen, Yuncong Liu, Lu Chen, Su Zhu, Mengyue Wu, Kai Yu
The second phase is to fine-tune the pretrained model on the TOD data.
no code implementations • 25 May 2022 • Zhi Chen, Jijia Bao, Lu Chen, Yuncong Liu, Da Ma, Bei Chen, Mengyue Wu, Su Zhu, Xin Dong, Fujiang Ge, Qingliang Miao, Jian-Guang Lou, Kai Yu
In this work, we aim to build a unified dialogue foundation model (DFM) which can be used to solve massive diverse dialogue tasks.
no code implementations • 24 May 2022 • Binwei Yao, Chao Shi, Likai Zou, Lingfeng Dai, Mengyue Wu, Lu Chen, Zhen Wang, Kai Yu
In a depression-diagnosis-directed clinical session, doctors initiate a conversation with ample emotional support that guides the patients to expose their symptoms based on clinical diagnosis criteria.
no code implementations • 23 May 2022 • Zhiling Zhang, Siyuan Chen, Mengyue Wu, Kenny Q. Zhu
Mental disease detection (MDD) from social media has suffered from poor generalizability and interpretability, due to lack of symptom modeling.
1 code implementation • 19 May 2022 • Zhiling Zhang, Siyuan Chen, Mengyue Wu, Kenny Q. Zhu
Depression is a prominent health challenge to the world, and early risk detection (ERD) of depression from online posts can be a promising technique for combating the threat.
no code implementations • 29 Apr 2022 • Wen Wu, Mengyue Wu, Kai Yu
Automatic depression detection has attracted increasing amount of attention but remains a challenging task.
no code implementations • 25 Mar 2022 • Siyu Lou, Xuenan Xu, Mengyue Wu, Kai Yu
Using pre-trained audio features and a descriptor-based aggregation method, we build our contextual audio-text retrieval system.
2 code implementations • 10 Oct 2021 • Zelin Zhou, Zhiling Zhang, Xuenan Xu, Zeyu Xie, Mengyue Wu, Kenny Q. Zhu
Current metrics are found in poor correlation with human annotations on these datasets.
1 code implementation • DCASE Challenge 2021 • Xuenan Xu, Zeyu Xie, Mengyue Wu, Kai Yu
This report proposes an audio captioning system for the Detection and Classification of Acoustic Scenes and Events (DCASE) 2021 challenge task Task 6.
Ranked #2 on
Audio captioning
on Clotho
(using extra training data)
no code implementations • Findings (ACL) 2021 • Zhi Chen, Lu Chen, Hanqi Li, Ruisheng Cao, Da Ma, Mengyue Wu, Kai Yu
A dual learning approach is also proposed for the utterance rewrite model to address the data sparsity problem.
1 code implementation • 19 Jan 2021 • Heinrich Dinkel, Mengyue Wu, Kai Yu
Our model outperforms other approaches on the DCASE2018 and URBAN-SED datasets without requiring prior duration knowledge.
Data Augmentation
Sound Event Detection
Sound
Audio and Speech Processing
1 code implementation • ECCV 2020 • Rui Qian, Di Hu, Heinrich Dinkel, Mengyue Wu, Ning Xu, Weiyao Lin
How to visually localize multiple sound sources in unconstrained videos is a formidable problem, especially when lack of the pairwise sound-object annotations.
no code implementations • 29 Jun 2020 • Die Zhang, Huilin Zhou, Hao Zhang, Xiaoyi Bao, Da Huo, Ruizhao Chen, Xu Cheng, Mengyue Wu, Quanshi Zhang
This paper proposes a method to disentangle and quantify interactions among words that are encoded inside a DNN for natural language processing.
1 code implementation • 27 Mar 2020 • Heinrich Dinkel, Yefei Chen, Mengyue Wu, Kai Yu
We proposed two GPVAD models, one full (GPV-F), trained on 527 Audioset sound events, and one binary (GPV-B), only distinguishing speech and noise.
Sound Audio and Speech Processing
1 code implementation • 31 May 2019 • Xuenan Xu, Heinrich Dinkel, Mengyue Wu, Kai Yu
Captioning has attracted much attention in image and video understanding while a small amount of work examines audio captioning.
1 code implementation • 8 Apr 2019 • Heinrich Dinkel, Mengyue Wu, Kai Yu
Previous text-based depression detection is commonly based on large user-generated data.
1 code implementation • 25 Feb 2019 • Mengyue Wu, Heinrich Dinkel, Kai Yu
A baseline encoder-decoder model is provided for both English and Mandarin.