Search Results for author: Lingwei Meng

Found 24 papers, 7 papers with code

FELLE: Autoregressive Speech Synthesis with Token-Wise Coarse-to-Fine Flow Matching

no code implementations16 Feb 2025 Hui Wang, Shujie Liu, Lingwei Meng, Jinyu Li, Yifan Yang, Shiwan Zhao, Haiyang Sun, Yanqing Liu, Haoqin Sun, Jiaming Zhou, Yan Lu, Yong Qin

To advance continuous-valued token modeling and temporal-coherence enforcement, we propose FELLE, an autoregressive model that integrates language modeling with token-wise flow matching.

Language Modeling Language Modelling +1

Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey

1 code implementation16 Dec 2024 Liang Chen, Zekun Wang, Shuhuai Ren, Lei LI, Haozhe Zhao, Yunshui Li, Zefan Cai, Hongcheng Guo, Lei Zhang, Yizhe Xiong, Yichi Zhang, Ruoyu Wu, Qingxiu Dong, Ge Zhang, Jian Yang, Lingwei Meng, Shujie Hu, Yulong Chen, Junyang Lin, Shuai Bai, Andreas Vlachos, Xu Tan, Minjia Zhang, Wen Xiao, Aaron Yee, Tianyu Liu, Baobao Chang

As Large Language Models (LLMs) have advanced to unify understanding and generation tasks within the textual modality, recent research has shown that tasks from different modalities can also be effectively encapsulated within the NTP framework, transforming the multimodal information into tokens and predict the next one given the context.

Language Modeling Language Modelling +2

Towards Within-Class Variation in Alzheimer's Disease Detection from Spontaneous Speech

no code implementations22 Sep 2024 Jiawen Kang, Dongrui Han, Lingwei Meng, Jingyan Zhou, Jinchao Li, Xixin Wu, Helen Meng

Unlike conventional classification tasks, we identify within-class variation as a critical challenge in AD detection: individuals with AD exhibit a spectrum of cognitive impairments.

Alzheimer's Disease Detection Binary Classification +1

Disentangling Speakers in Multi-Talker Speech Recognition with Speaker-Aware CTC

1 code implementation19 Sep 2024 Jiawen Kang, Lingwei Meng, Mingyu Cui, Yuejiao Wang, Xixin Wu, Xunying Liu, Helen Meng

SACTC is a tailored CTC variant for multi-talker scenarios, it explicitly models speaker disentanglement by constraining the encoder to represent different speakers' tokens at specific time frames.

Disentanglement speech-recognition +1

Large Language Model-based FMRI Encoding of Language Functions for Subjects with Neurocognitive Disorder

no code implementations15 Jul 2024 Yuejiao Wang, Xianmin Gong, Lingwei Meng, Xixin Wu, Helen Meng

This study highlights the potential of fMRI encoding models and brain scores for detecting early functional changes in NCD patients.

Language Modeling Language Modelling +1

Empowering Whisper as a Joint Multi-Talker and Target-Talker Speech Recognition System

1 code implementation13 Jul 2024 Lingwei Meng, Jiawen Kang, Yuejiao Wang, Zengrui Jin, Xixin Wu, Xunying Liu, Helen Meng

In this study, we propose a pioneering approach to empower Whisper, which is a speech foundation model, to tackle joint multi-talker and target-talker speech recognition tasks.

Decoder speech-recognition +1

VALL-E R: Robust and Efficient Zero-Shot Text-to-Speech Synthesis via Monotonic Alignment

no code implementations12 Jun 2024 Bing Han, Long Zhou, Shujie Liu, Sanyuan Chen, Lingwei Meng, Yanming Qian, Yanqing Liu, Sheng Zhao, Jinyu Li, Furu Wei

With the help of discrete neural audio codecs, large language models (LLM) have increasingly been recognized as a promising methodology for zero-shot Text-to-Speech (TTS) synthesis.

Quantization Speech Synthesis +2

WavLLM: Towards Robust and Adaptive Speech Large Language Model

1 code implementation31 Mar 2024 Shujie Hu, Long Zhou, Shujie Liu, Sanyuan Chen, Lingwei Meng, Hongkun Hao, Jing Pan, Xunying Liu, Jinyu Li, Sunit Sivasankaran, Linquan Liu, Furu Wei

In this work, we introduce WavLLM, a robust and adaptive speech large language model with dual encoders, and a prompt-aware LoRA weight adapter, optimized by a two-stage curriculum learning approach.

Language Modeling Language Modelling +1

UNIT-DSR: Dysarthric Speech Reconstruction System Using Speech Unit Normalization

no code implementations26 Jan 2024 Yuejiao Wang, Xixin Wu, Disong Wang, Lingwei Meng, Helen Meng

Dysarthric speech reconstruction (DSR) systems aim to automatically convert dysarthric speech into normal-sounding speech.

Decoder Domain Adaptation +3

Cross-Speaker Encoding Network for Multi-Talker Speech Recognition

1 code implementation8 Jan 2024 Jiawen Kang, Lingwei Meng, Mingyu Cui, Haohan Guo, Xixin Wu, Xunying Liu, Helen Meng

To the best of our knowledge, this work represents an early effort to integrate SIMO and SISO for multi-talker speech recognition.

Decoder speech-recognition +1

Unified Modeling of Multi-Talker Overlapped Speech Recognition and Diarization with a Sidecar Separator

no code implementations25 May 2023 Lingwei Meng, Jiawen Kang, Mingyu Cui, Haibin Wu, Xixin Wu, Helen Meng

Extending on this, we incorporate a diarization branch into the Sidecar, allowing for unified modeling of both ASR and diarization with a negligible overhead of only 768 parameters.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

The defender's perspective on automatic speaker verification: An overview

no code implementations22 May 2023 Haibin Wu, Jiawen Kang, Lingwei Meng, Helen Meng, Hung-Yi Lee

Automatic speaker verification (ASV) plays a critical role in security-sensitive environments.

Speaker Verification

A Sidecar Separator Can Convert a Single-Talker Speech Recognition System to a Multi-Talker One

1 code implementation20 Feb 2023 Lingwei Meng, Jiawen Kang, Mingyu Cui, Yuejiao Wang, Xixin Wu, Helen Meng

Although automatic speech recognition (ASR) can perform well in common non-overlapping environments, sustaining performance in multi-talker overlapping speech recognition remains challenging.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Tackling Spoofing-Aware Speaker Verification with Multi-Model Fusion

no code implementations18 Jun 2022 Haibin Wu, Jiawen Kang, Lingwei Meng, Yang Zhang, Xixin Wu, Zhiyong Wu, Hung-Yi Lee, Helen Meng

However, previous works show that state-of-the-art ASV models are seriously vulnerable to voice spoofing attacks, and the recently proposed high-performance spoofing countermeasure (CM) models only focus solely on the standalone anti-spoofing tasks, and ignore the subsequent speaker verification process.

Open-Ended Question Answering Speaker Verification

Spoofing-Aware Speaker Verification by Multi-Level Fusion

no code implementations29 Mar 2022 Haibin Wu, Lingwei Meng, Jiawen Kang, Jinchao Li, Xu Li, Xixin Wu, Hung-Yi Lee, Helen Meng

In the second-level fusion, the CM score and ASV scores directly from ASV systems will be concatenated into a prediction block for the final decision.

Speaker Verification

The CUHK-TENCENT speaker diarization system for the ICASSP 2022 multi-channel multi-party meeting transcription challenge

no code implementations4 Feb 2022 Naijun Zheng, Na Li, Xixin Wu, Lingwei Meng, Jiawen Kang, Haibin Wu, Chao Weng, Dan Su, Helen Meng

This paper describes our speaker diarization system submitted to the Multi-channel Multi-party Meeting Transcription (M2MeT) challenge, where Mandarin meeting data were recorded in multi-channel format for diarization and automatic speech recognition (ASR) tasks.

Action Detection Activity Detection +6

PM2.5-GNN: A Domain Knowledge Enhanced Graph Neural Network For PM2.5 Forecasting

2 code implementations ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems 2020 Shuo Wang, Yan-ran Li, Jiang Zhang, Qingye Meng, Lingwei Meng, Fei Gao

When predicting PM2. 5 concentrations, it is necessary to consider complex information sources since the concentrations are influenced by various factors within a long period.

Graph Neural Network

Cannot find the paper you are looking for? You can Submit a new open access paper.