no code implementations • 16 Feb 2025 • Hui Wang, Shujie Liu, Lingwei Meng, Jinyu Li, Yifan Yang, Shiwan Zhao, Haiyang Sun, Yanqing Liu, Haoqin Sun, Jiaming Zhou, Yan Lu, Yong Qin
To advance continuous-valued token modeling and temporal-coherence enforcement, we propose FELLE, an autoregressive model that integrates language modeling with token-wise flow matching.
no code implementations • 20 Dec 2024 • Yifan Yang, Ziyang Ma, Shujie Liu, Jinyu Li, Hui Wang, Lingwei Meng, Haiyang Sun, Yuzhe Liang, Ruiyang Xu, Yuxuan Hu, Yan Lu, Rui Zhao, Xie Chen
This paper introduces Interleaved Speech-Text Language Model (IST-LM) for streaming zero-shot Text-to-Speech (TTS).
1 code implementation • 16 Dec 2024 • Liang Chen, Zekun Wang, Shuhuai Ren, Lei LI, Haozhe Zhao, Yunshui Li, Zefan Cai, Hongcheng Guo, Lei Zhang, Yizhe Xiong, Yichi Zhang, Ruoyu Wu, Qingxiu Dong, Ge Zhang, Jian Yang, Lingwei Meng, Shujie Hu, Yulong Chen, Junyang Lin, Shuai Bai, Andreas Vlachos, Xu Tan, Minjia Zhang, Wen Xiao, Aaron Yee, Tianyu Liu, Baobao Chang
As Large Language Models (LLMs) have advanced to unify understanding and generation tasks within the textual modality, recent research has shown that tasks from different modalities can also be effectively encapsulated within the NTP framework, transforming the multimodal information into tokens and predict the next one given the context.
no code implementations • 27 Oct 2024 • Zongyi Li, Shujie Hu, Shujie Liu, Long Zhou, Jeongsoo Choi, Lingwei Meng, Xun Guo, Jinyu Li, Hefei Ling, Furu Wei
Text-to-video models have recently undergone rapid and substantial advancements.
no code implementations • 22 Sep 2024 • Jiawen Kang, Dongrui Han, Lingwei Meng, Jingyan Zhou, Jinchao Li, Xixin Wu, Helen Meng
Unlike conventional classification tasks, we identify within-class variation as a critical challenge in AD detection: individuals with AD exhibit a spectrum of cognitive impairments.
1 code implementation • 19 Sep 2024 • Jiawen Kang, Lingwei Meng, Mingyu Cui, Yuejiao Wang, Xixin Wu, Xunying Liu, Helen Meng
SACTC is a tailored CTC variant for multi-talker scenarios, it explicitly models speaker disentanglement by constraining the encoder to represent different speakers' tokens at specific time frames.
no code implementations • 13 Sep 2024 • Lingwei Meng, Shujie Hu, Jiawen Kang, Zhaoqing Li, Yuejiao Wang, Wenxuan Wu, Xixin Wu, Xunying Liu, Helen Meng
Recent advancements in large language models (LLMs) have revolutionized various domains, bringing significant progress and new opportunities.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+4
no code implementations • 1 Sep 2024 • Zengrui Jin, Yifan Yang, Mohan Shi, Wei Kang, Xiaoyu Yang, Zengwei Yao, Fangjun Kuang, Liyong Guo, Lingwei Meng, Long Lin, Yong Xu, Shi-Xiong Zhang, Daniel Povey
This paper presents a large-scale far-field overlapping speech dataset, crafted to advance research in speech separation, recognition, and speaker diarization.
no code implementations • 15 Jul 2024 • Yuejiao Wang, Xianmin Gong, Lingwei Meng, Xixin Wu, Helen Meng
This study highlights the potential of fMRI encoding models and brain scores for detecting early functional changes in NCD patients.
1 code implementation • 13 Jul 2024 • Lingwei Meng, Jiawen Kang, Yuejiao Wang, Zengrui Jin, Xixin Wu, Xunying Liu, Helen Meng
In this study, we propose a pioneering approach to empower Whisper, which is a speech foundation model, to tackle joint multi-talker and target-talker speech recognition tasks.
no code implementations • 11 Jul 2024 • Lingwei Meng, Long Zhou, Shujie Liu, Sanyuan Chen, Bing Han, Shujie Hu, Yanqing Liu, Jinyu Li, Sheng Zhao, Xixin Wu, Helen Meng, Furu Wei
We present MELLE, a novel continuous-valued tokens based language modeling approach for text to speech synthesis (TTS).
no code implementations • 12 Jun 2024 • Bing Han, Long Zhou, Shujie Liu, Sanyuan Chen, Lingwei Meng, Yanming Qian, Yanqing Liu, Sheng Zhao, Jinyu Li, Furu Wei
With the help of discrete neural audio codecs, large language models (LLM) have increasingly been recognized as a promising methodology for zero-shot Text-to-Speech (TTS) synthesis.
1 code implementation • 31 Mar 2024 • Shujie Hu, Long Zhou, Shujie Liu, Sanyuan Chen, Lingwei Meng, Hongkun Hao, Jing Pan, Xunying Liu, Jinyu Li, Sunit Sivasankaran, Linquan Liu, Furu Wei
In this work, we introduce WavLLM, a robust and adaptive speech large language model with dual encoders, and a prompt-aware LoRA weight adapter, optimized by a two-stage curriculum learning approach.
no code implementations • 26 Jan 2024 • Yuejiao Wang, Xixin Wu, Disong Wang, Lingwei Meng, Helen Meng
Dysarthric speech reconstruction (DSR) systems aim to automatically convert dysarthric speech into normal-sounding speech.
1 code implementation • 8 Jan 2024 • Jiawen Kang, Lingwei Meng, Mingyu Cui, Haohan Guo, Xixin Wu, Xunying Liu, Helen Meng
To the best of our knowledge, this work represents an early effort to integrate SIMO and SISO for multi-talker speech recognition.
no code implementations • 25 May 2023 • Lingwei Meng, Jiawen Kang, Mingyu Cui, Haibin Wu, Xixin Wu, Helen Meng
Extending on this, we incorporate a diarization branch into the Sidecar, allowing for unified modeling of both ASR and diarization with a negligible overhead of only 768 parameters.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • 22 May 2023 • Haibin Wu, Jiawen Kang, Lingwei Meng, Helen Meng, Hung-Yi Lee
Automatic speaker verification (ASV) plays a critical role in security-sensitive environments.
1 code implementation • 20 Feb 2023 • Lingwei Meng, Jiawen Kang, Mingyu Cui, Yuejiao Wang, Xixin Wu, Helen Meng
Although automatic speech recognition (ASR) can perform well in common non-overlapping environments, sustaining performance in multi-talker overlapping speech recognition remains challenging.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • 29 Oct 2022 • Lingwei Meng, Di Dong, Xin Chen, Mengjie Fang, Rongpin Wang, Jing Li, Zaiyi Liu, Jie Tian
We comprehensively compared 2D and 3D radiomic features' representation and discrimination capacity regarding GC, via three tasks.
no code implementations • 28 Jun 2022 • Yi Wang, Tianzi Wang, Zi Ye, Lingwei Meng, Shoukang Hu, Xixin Wu, Xunying Liu, Helen Meng
Early diagnosis of Alzheimer's disease (AD) is crucial in facilitating preventive care and delay progression.
no code implementations • 18 Jun 2022 • Haibin Wu, Jiawen Kang, Lingwei Meng, Yang Zhang, Xixin Wu, Zhiyong Wu, Hung-Yi Lee, Helen Meng
However, previous works show that state-of-the-art ASV models are seriously vulnerable to voice spoofing attacks, and the recently proposed high-performance spoofing countermeasure (CM) models only focus solely on the standalone anti-spoofing tasks, and ignore the subsequent speaker verification process.
no code implementations • 29 Mar 2022 • Haibin Wu, Lingwei Meng, Jiawen Kang, Jinchao Li, Xu Li, Xixin Wu, Hung-Yi Lee, Helen Meng
In the second-level fusion, the CM score and ASV scores directly from ASV systems will be concatenated into a prediction block for the final decision.
no code implementations • 4 Feb 2022 • Naijun Zheng, Na Li, Xixin Wu, Lingwei Meng, Jiawen Kang, Haibin Wu, Chao Weng, Dan Su, Helen Meng
This paper describes our speaker diarization system submitted to the Multi-channel Multi-party Meeting Transcription (M2MeT) challenge, where Mandarin meeting data were recorded in multi-channel format for diarization and automatic speech recognition (ASR) tasks.
2 code implementations • ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems 2020 • Shuo Wang, Yan-ran Li, Jiang Zhang, Qingye Meng, Lingwei Meng, Fei Gao
When predicting PM2. 5 concentrations, it is necessary to consider complex information sources since the concentrations are influenced by various factors within a long period.