Search Results for author: Mengyue Wu

Found 35 papers, 12 papers with code

Is Your Image a Good Storyteller?

1 code implementation29 Dec 2024 Xiujie Song, Xiaoyi Pang, Haifeng Tang, Mengyue Wu, Kenny Q. Zhu

Additionally, semantically rich images can benefit the development of vision models, as images with limited semantics are becoming less challenging for them.

Smooth-Foley: Creating Continuous Sound for Video-to-Audio Generation Under Semantic Guidance

no code implementations24 Dec 2024 Yaoyun Zhang, Xuenan Xu, Mengyue Wu

To tackle these challenges, we propose Smooth-Foley, a V2A generative model taking semantic guidance from the textual label across the generation to enhance both semantic and temporal alignment in audio.

Audio Generation Video Alignment

Unified Pathological Speech Analysis with Prompt Tuning

no code implementations5 Nov 2024 Fei Yang, Xuenan Xu, Mengyue Wu, Kai Yu

This system uses prompt tuning to adjust only a small part of the parameters to detect different diseases from speeches of possible patients.

Language Modeling Language Modelling

Long Term Memory: The Foundation of AI Self-Evolution

no code implementations21 Oct 2024 Xun Jiang, Feng Li, Han Zhao, Jiaying Wang, Jun Shao, Shihao Xu, Shu Zhang, Weiling Chen, Xavier Tang, Yize Chen, Mengyue Wu, Weizhi Ma, Mengdi Wang, Tianqiao Chen

We outline the structure of LTM and the systems needed for effective data retention and representation.

Mixed Chain-of-Psychotherapies for Emotional Support Chatbot

no code implementations29 Sep 2024 Siyuan Chen, Cong Ming, Zhiling Zhang, Yanyi Chen, Kenny Q. Zhu, Mengyue Wu

In the realm of mental health support chatbots, it is vital to show empathy and encourage self-exploration to provide tailored solutions.

Chatbot

Depression Diagnosis Dialogue Simulation: Self-improving Psychiatrist with Tertiary Memory

no code implementations20 Sep 2024 Kunyao Lan, Bingrui Jin, Zichen Zhu, Siyuan Chen, Shu Zhang, Kenny Q. Zhu, Mengyue Wu

Mental health issues, particularly depressive disorders, present significant challenges in contemporary society, necessitating the development of effective automated diagnostic methods.

Evaluation of data inconsistency for multi-modal sentiment analysis

no code implementations5 Jun 2024 YuFei Wang, Mengyue Wu

Emotion semantic inconsistency is an ubiquitous challenge in multi-modal sentiment analysis (MSA).

Emotion Recognition Sentiment Analysis

SemantiCodec: An Ultra Low Bitrate Semantic Audio Codec for General Sound

no code implementations30 Apr 2024 Haohe Liu, Xuenan Xu, Yi Yuan, Mengyue Wu, Wenwu Wang, Mark D. Plumbley

Large language models (LLMs) have significantly advanced audio processing through audio codecs that convert audio into discrete tokens, enabling the application of language modelling techniques to audio data.

Decoder Language Modelling

Towards Reliable and Empathetic Depression-Diagnosis-Oriented Chats

no code implementations7 Apr 2024 Kunyao Lan, Cong Ming, Binwei Yao, Lu Chen, Mengyue Wu

Nevertheless, the blend of task-oriented and chit-chat in diagnosis-related dialogues necessitates professional expertise and empathy.

A Cognitive Evaluation Benchmark of Image Reasoning and Description for Large Vision-Language Models

no code implementations28 Feb 2024 Xiujie Song, Mengyue Wu, Kenny Q. Zhu, Chunhao Zhang, Yanyi Chen

Large Vision-Language Models (LVLMs), despite their recent success, are hardly comprehensively tested for their cognitive abilities.

Question Answering Visual Question Answering

Phonetic and Lexical Discovery of a Canine Language using HuBERT

no code implementations25 Feb 2024 Xingyuan Li, Sinong Wang, Zeyu Xie, Mengyue Wu, Kenny Q. Zhu

This paper delves into the pioneering exploration of potential communication patterns within dog vocalizations and transcends traditional linguistic analysis barriers, which heavily relies on human priori knowledge on limited datasets to find sound units in dog vocalization.

PsyEval: A Suite of Mental Health Related Tasks for Evaluating Large Language Models

1 code implementation15 Nov 2023 Haoan Jin, Siyuan Chen, Dilawaier Dilixiati, Yewei Jiang, Mengyue Wu, Kenny Q. Zhu

This comprehensive framework is designed to thoroughly assess the unique challenges and intricacies of mental health-related tasks, making PsyEval a highly specialized and valuable tool for evaluating LLM performance in this domain.

Language Modelling Large Language Model +1

Does My Dog ''Speak'' Like Me? The Acoustic Correlation between Pet Dogs and Their Human Owners

no code implementations21 Sep 2023 Jieyi Huang, Chunhao Zhang, YuFei Wang, Mengyue Wu, Kenny Zhu

How hosts language influence their pets' vocalization is an interesting yet underexplored problem.

Towards Lexical Analysis of Dog Vocalizations via Online Videos

no code implementations21 Sep 2023 YuFei Wang, Chunhao Zhang, Jieyi Huang, Mengyue Wu, Kenny Zhu

This study presents a data-driven investigation into the semantics of dog vocalizations via correlating different sound types with consistent semantics.

Lexical Analysis

Auto-ACD: A Large-scale Dataset for Audio-Language Representation Learning

no code implementations20 Sep 2023 Luoyi Sun, Xuenan Xu, Mengyue Wu, Weidi Xie

Recently, the AI community has made significant strides in developing powerful foundation models, driven by large-scale multimodal datasets.

Audio captioning Caption Generation +6

Improving Audio Caption Fluency with Automatic Error Correction

no code implementations16 Jun 2023 Hanxue Zhang, Zeyu Xie, Xuenan Xu, Mengyue Wu, Kai Yu

Automated audio captioning (AAC) is an important cross-modality translation task, aiming at generating descriptions for audio clips.

Audio captioning Sentence

LLM-empowered Chatbots for Psychiatrist and Patient Simulation: Application and Evaluation

no code implementations23 May 2023 Siyuan Chen, Mengyue Wu, Kenny Q. Zhu, Kunyao Lan, Zhiling Zhang, Lyuchun Cui

Empowering chatbots in the field of mental health is receiving increasing amount of attention, while there still lacks exploration in developing and evaluating chatbots in psychiatric outpatient scenarios.

Chatbot

Semantic Space Grounded Weighted Decoding for Multi-Attribute Controllable Dialogue Generation

1 code implementation4 May 2023 Zhiling Zhang, Mengyue Wu, Kenny Q. Zhu

Controlling chatbot utterance generation with multiple attributes such as personalities, emotions and dialogue acts is a practically useful but under-studied problem.

Attribute Dialogue Generation

D4: a Chinese Dialogue Dataset for Depression-Diagnosis-Oriented Chat

no code implementations24 May 2022 Binwei Yao, Chao Shi, Likai Zou, Lingfeng Dai, Mengyue Wu, Lu Chen, Zhen Wang, Kai Yu

In a depression-diagnosis-directed clinical session, doctors initiate a conversation with ample emotional support that guides the patients to expose their symptoms based on clinical diagnosis criteria.

Response Generation

Symptom Identification for Interpretable Detection of Multiple Mental Disorders

no code implementations23 May 2022 Zhiling Zhang, Siyuan Chen, Mengyue Wu, Kenny Q. Zhu

Mental disease detection (MDD) from social media has suffered from poor generalizability and interpretability, due to lack of symptom modeling.

Diversity

Psychiatric Scale Guided Risky Post Screening for Early Detection of Depression

1 code implementation19 May 2022 Zhiling Zhang, Siyuan Chen, Mengyue Wu, Kenny Q. Zhu

Depression is a prominent health challenge to the world, and early risk detection (ERD) of depression from online posts can be a promising technique for combating the threat.

Depression Detection

Climate and Weather: Inspecting Depression Detection via Emotion Recognition

no code implementations29 Apr 2022 Wen Wu, Mengyue Wu, Kai Yu

Automatic depression detection has attracted increasing amount of attention but remains a challenging task.

Depression Detection Emotion Recognition

Audio-text Retrieval in Context

no code implementations25 Mar 2022 Siyu Lou, Xuenan Xu, Mengyue Wu, Kai Yu

Using pre-trained audio features and a descriptor-based aggregation method, we build our contextual audio-text retrieval system.

AudioCaps Text Retrieval

THE SJTU SYSTEM FOR DCASE2021 CHALLENGE TASK 6: AUDIO CAPTIONING BASED ON ENCODER PRE-TRAINING AND REINFORCEMENT LEARNING

1 code implementation DCASE Challenge 2021 Xuenan Xu, Zeyu Xie, Mengyue Wu, Kai Yu

This report proposes an audio captioning system for the Detection and Classification of Acoustic Scenes and Events (DCASE) 2021 challenge task Task 6.

Ranked #5 on Audio captioning on Clotho (using extra training data)

Audio captioning Audio Tagging +3

Towards duration robust weakly supervised sound event detection

1 code implementation19 Jan 2021 Heinrich Dinkel, Mengyue Wu, Kai Yu

Our model outperforms other approaches on the DCASE2018 and URBAN-SED datasets without requiring prior duration knowledge.

Data Augmentation Sound Event Detection Sound Audio and Speech Processing

Multiple Sound Sources Localization from Coarse to Fine

1 code implementation ECCV 2020 Rui Qian, Di Hu, Heinrich Dinkel, Mengyue Wu, Ning Xu, Weiyao Lin

How to visually localize multiple sound sources in unconstrained videos is a formidable problem, especially when lack of the pairwise sound-object annotations.

Building Interpretable Interaction Trees for Deep NLP Models

no code implementations29 Jun 2020 Die Zhang, Huilin Zhou, Hao Zhang, Xiaoyi Bao, Da Huo, Ruizhao Chen, Xu Cheng, Mengyue Wu, Quanshi Zhang

This paper proposes a method to disentangle and quantify interactions among words that are encoded inside a DNN for natural language processing.

Sentence

Voice activity detection in the wild via weakly supervised sound event detection

1 code implementation27 Mar 2020 Heinrich Dinkel, Yefei Chen, Mengyue Wu, Kai Yu

We proposed two GPVAD models, one full (GPV-F), trained on 527 Audioset sound events, and one binary (GPV-B), only distinguishing speech and noise.

Sound Audio and Speech Processing

Audio Caption in a Car Setting with a Sentence-Level Loss

1 code implementation31 May 2019 Xuenan Xu, Heinrich Dinkel, Mengyue Wu, Kai Yu

Captioning has attracted much attention in image and video understanding while a small amount of work examines audio captioning.

Audio captioning Decoder +6

Text-based depression detection on sparse data

1 code implementation8 Apr 2019 Heinrich Dinkel, Mengyue Wu, Kai Yu

Previous text-based depression detection is commonly based on large user-generated data.

Depression Detection Sentence +1

Audio Caption: Listen and Tell

1 code implementation25 Feb 2019 Mengyue Wu, Heinrich Dinkel, Kai Yu

A baseline encoder-decoder model is provided for both English and Mandarin.

Decoder General Classification

Cannot find the paper you are looking for? You can Submit a new open access paper.