1 code implementation • 29 Dec 2024 • Xiujie Song, Xiaoyi Pang, Haifeng Tang, Mengyue Wu, Kenny Q. Zhu
Additionally, semantically rich images can benefit the development of vision models, as images with limited semantics are becoming less challenging for them.
no code implementations • 24 Dec 2024 • Yaoyun Zhang, Xuenan Xu, Mengyue Wu
To tackle these challenges, we propose Smooth-Foley, a V2A generative model taking semantic guidance from the textual label across the generation to enhance both semantic and temporal alignment in audio.
no code implementations • 5 Nov 2024 • Fei Yang, Xuenan Xu, Mengyue Wu, Kai Yu
This system uses prompt tuning to adjust only a small part of the parameters to detect different diseases from speeches of possible patients.
no code implementations • 21 Oct 2024 • Xun Jiang, Feng Li, Han Zhao, Jiaying Wang, Jun Shao, Shihao Xu, Shu Zhang, Weiling Chen, Xavier Tang, Yize Chen, Mengyue Wu, Weizhi Ma, Mengdi Wang, Tianqiao Chen
We outline the structure of LTM and the systems needed for effective data retention and representation.
no code implementations • 29 Sep 2024 • Siyuan Chen, Cong Ming, Zhiling Zhang, Yanyi Chen, Kenny Q. Zhu, Mengyue Wu
In the realm of mental health support chatbots, it is vital to show empathy and encourage self-exploration to provide tailored solutions.
no code implementations • 20 Sep 2024 • Kunyao Lan, Bingrui Jin, Zichen Zhu, Siyuan Chen, Shu Zhang, Kenny Q. Zhu, Mengyue Wu
Mental health issues, particularly depressive disorders, present significant challenges in contemporary society, necessitating the development of effective automated diagnostic methods.
no code implementations • 5 Jun 2024 • YuFei Wang, Mengyue Wu
Emotion semantic inconsistency is an ubiquitous challenge in multi-modal sentiment analysis (MSA).
no code implementations • 30 Apr 2024 • Haohe Liu, Xuenan Xu, Yi Yuan, Mengyue Wu, Wenwu Wang, Mark D. Plumbley
Large language models (LLMs) have significantly advanced audio processing through audio codecs that convert audio into discrete tokens, enabling the application of language modelling techniques to audio data.
no code implementations • 7 Apr 2024 • Kunyao Lan, Cong Ming, Binwei Yao, Lu Chen, Mengyue Wu
Nevertheless, the blend of task-oriented and chit-chat in diagnosis-related dialogues necessitates professional expertise and empathy.
no code implementations • 28 Feb 2024 • Xiujie Song, Mengyue Wu, Kenny Q. Zhu, Chunhao Zhang, Yanyi Chen
Large Vision-Language Models (LVLMs), despite their recent success, are hardly comprehensively tested for their cognitive abilities.
no code implementations • 25 Feb 2024 • Xingyuan Li, Sinong Wang, Zeyu Xie, Mengyue Wu, Kenny Q. Zhu
This paper delves into the pioneering exploration of potential communication patterns within dog vocalizations and transcends traditional linguistic analysis barriers, which heavily relies on human priori knowledge on limited datasets to find sound units in dog vocalization.
1 code implementation • 15 Nov 2023 • Haoan Jin, Siyuan Chen, Dilawaier Dilixiati, Yewei Jiang, Mengyue Wu, Kenny Q. Zhu
This comprehensive framework is designed to thoroughly assess the unique challenges and intricacies of mental health-related tasks, making PsyEval a highly specialized and valuable tool for evaluating LLM performance in this domain.
no code implementations • 21 Sep 2023 • Jieyi Huang, Chunhao Zhang, YuFei Wang, Mengyue Wu, Kenny Zhu
How hosts language influence their pets' vocalization is an interesting yet underexplored problem.
no code implementations • 21 Sep 2023 • YuFei Wang, Chunhao Zhang, Jieyi Huang, Mengyue Wu, Kenny Zhu
This study presents a data-driven investigation into the semantics of dog vocalizations via correlating different sound types with consistent semantics.
no code implementations • 20 Sep 2023 • Luoyi Sun, Xuenan Xu, Mengyue Wu, Weidi Xie
Recently, the AI community has made significant strides in developing powerful foundation models, driven by large-scale multimodal datasets.
no code implementations • 16 Jun 2023 • Hanxue Zhang, Zeyu Xie, Xuenan Xu, Mengyue Wu, Kai Yu
Automated audio captioning (AAC) is an important cross-modality translation task, aiming at generating descriptions for audio clips.
no code implementations • 23 May 2023 • Siyuan Chen, Mengyue Wu, Kenny Q. Zhu, Kunyao Lan, Zhiling Zhang, Lyuchun Cui
Empowering chatbots in the field of mental health is receiving increasing amount of attention, while there still lacks exploration in developing and evaluating chatbots in psychiatric outpatient scenarios.
1 code implementation • 4 May 2023 • Zhiling Zhang, Mengyue Wu, Kenny Q. Zhu
Controlling chatbot utterance generation with multiple attributes such as personalities, emotions and dialogue acts is a practically useful but under-studied problem.
no code implementations • 10 Sep 2022 • Zhi Chen, Yuncong Liu, Lu Chen, Su Zhu, Mengyue Wu, Kai Yu
The second phase is to fine-tune the pretrained model on the TOD data.
no code implementations • 25 May 2022 • Zhi Chen, Jijia Bao, Lu Chen, Yuncong Liu, Da Ma, Bei Chen, Mengyue Wu, Su Zhu, Xin Dong, Fujiang Ge, Qingliang Miao, Jian-Guang Lou, Kai Yu
In this work, we aim to build a unified dialogue foundation model (DFM) which can be used to solve massive diverse dialogue tasks.
no code implementations • 24 May 2022 • Binwei Yao, Chao Shi, Likai Zou, Lingfeng Dai, Mengyue Wu, Lu Chen, Zhen Wang, Kai Yu
In a depression-diagnosis-directed clinical session, doctors initiate a conversation with ample emotional support that guides the patients to expose their symptoms based on clinical diagnosis criteria.
no code implementations • 23 May 2022 • Zhiling Zhang, Siyuan Chen, Mengyue Wu, Kenny Q. Zhu
Mental disease detection (MDD) from social media has suffered from poor generalizability and interpretability, due to lack of symptom modeling.
1 code implementation • 19 May 2022 • Zhiling Zhang, Siyuan Chen, Mengyue Wu, Kenny Q. Zhu
Depression is a prominent health challenge to the world, and early risk detection (ERD) of depression from online posts can be a promising technique for combating the threat.
no code implementations • 29 Apr 2022 • Wen Wu, Mengyue Wu, Kai Yu
Automatic depression detection has attracted increasing amount of attention but remains a challenging task.
no code implementations • 25 Mar 2022 • Siyu Lou, Xuenan Xu, Mengyue Wu, Kai Yu
Using pre-trained audio features and a descriptor-based aggregation method, we build our contextual audio-text retrieval system.
1 code implementation • 10 Oct 2021 • Zelin Zhou, Zhiling Zhang, Xuenan Xu, Zeyu Xie, Mengyue Wu, Kenny Q. Zhu
Current metrics are found in poor correlation with human annotations on these datasets.
1 code implementation • DCASE Challenge 2021 • Xuenan Xu, Zeyu Xie, Mengyue Wu, Kai Yu
This report proposes an audio captioning system for the Detection and Classification of Acoustic Scenes and Events (DCASE) 2021 challenge task Task 6.
Ranked #5 on
Audio captioning
on Clotho
(using extra training data)
no code implementations • Findings (ACL) 2021 • Zhi Chen, Lu Chen, Hanqi Li, Ruisheng Cao, Da Ma, Mengyue Wu, Kai Yu
A dual learning approach is also proposed for the utterance rewrite model to address the data sparsity problem.
1 code implementation • 19 Jan 2021 • Heinrich Dinkel, Mengyue Wu, Kai Yu
Our model outperforms other approaches on the DCASE2018 and URBAN-SED datasets without requiring prior duration knowledge.
Data Augmentation
Sound Event Detection
Sound
Audio and Speech Processing
1 code implementation • ECCV 2020 • Rui Qian, Di Hu, Heinrich Dinkel, Mengyue Wu, Ning Xu, Weiyao Lin
How to visually localize multiple sound sources in unconstrained videos is a formidable problem, especially when lack of the pairwise sound-object annotations.
no code implementations • 29 Jun 2020 • Die Zhang, Huilin Zhou, Hao Zhang, Xiaoyi Bao, Da Huo, Ruizhao Chen, Xu Cheng, Mengyue Wu, Quanshi Zhang
This paper proposes a method to disentangle and quantify interactions among words that are encoded inside a DNN for natural language processing.
1 code implementation • 27 Mar 2020 • Heinrich Dinkel, Yefei Chen, Mengyue Wu, Kai Yu
We proposed two GPVAD models, one full (GPV-F), trained on 527 Audioset sound events, and one binary (GPV-B), only distinguishing speech and noise.
Sound Audio and Speech Processing
1 code implementation • 31 May 2019 • Xuenan Xu, Heinrich Dinkel, Mengyue Wu, Kai Yu
Captioning has attracted much attention in image and video understanding while a small amount of work examines audio captioning.
1 code implementation • 8 Apr 2019 • Heinrich Dinkel, Mengyue Wu, Kai Yu
Previous text-based depression detection is commonly based on large user-generated data.
1 code implementation • 25 Feb 2019 • Mengyue Wu, Heinrich Dinkel, Kai Yu
A baseline encoder-decoder model is provided for both English and Mandarin.