Search Results for author: Dongmei Wang

Found 10 papers, 1 papers with code

CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations

no code implementations • 10 Apr 2024 • Leying Zhang, Yao Qian, Long Zhou, Shujie Liu, Dongmei Wang, Xiaofei Wang, Midia Yousefi, Yanmin Qian, Jinyu Li, Lei He, Sheng Zhao, Michael Zeng

CoVoMix is capable of first converting dialogue text into multiple streams of discrete tokens, with each token stream representing semantic information for individual talkers.

Dialogue Generation

Paper
Add Code

Adapting Multi-Lingual ASR Models for Handling Multiple Talkers

no code implementations • 30 May 2023 • Chenda Li, Yao Qian, Zhuo Chen, Naoyuki Kanda, Dongmei Wang, Takuya Yoshioka, Yanmin Qian, Michael Zeng

State-of-the-art large-scale universal speech models (USMs) show a decent automatic speech recognition (ASR) performance across multiple domains and languages.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Target Sound Extraction with Variable Cross-modality Clues

1 code implementation • 15 Mar 2023 • Chenda Li, Yao Qian, Zhuo Chen, Dongmei Wang, Takuya Yoshioka, Shujie Liu, Yanmin Qian, Michael Zeng

Automatic target sound extraction (TSE) is a machine learning approach to mimic the human auditory perception capability of attending to a sound source of interest from a mixture of sources.

AudioCaps Target Sound Extraction

Paper
Code

Target Speaker Voice Activity Detection with Transformers and Its Integration with End-to-End Neural Diarization

no code implementations • 27 Aug 2022 • Dongmei Wang, Xiong Xiao, Naoyuki Kanda, Takuya Yoshioka, Jian Wu

This paper describes a speaker diarization model based on target speaker voice activity detection (TS-VAD) using transformers.

Action Detection Activity Detection +2

Paper
Add Code

Leveraging Real Conversational Data for Multi-Channel Continuous Speech Separation

no code implementations • 7 Apr 2022 • Xiaofei Wang, Dongmei Wang, Naoyuki Kanda, Sefik Emre Eskimez, Takuya Yoshioka

In this paper, we propose a three-stage training scheme for the CSS model that can leverage both supervised data and extra large-scale unsupervised real-world conversational data.

Speech Separation

Paper
Add Code

PickNet: Real-Time Channel Selection for Ad Hoc Microphone Arrays

no code implementations • 24 Jan 2022 • Takuya Yoshioka, Xiaofei Wang, Dongmei Wang

Since PickNet utilizes only limited acoustic context at each time frame, the system using the proposed model works in real time and is robust to changes in acoustic conditions.

speech-recognition Speech Recognition

Paper
Add Code

All-neural beamformer for continuous speech separation

no code implementations • 13 Oct 2021 • Zhuohuang Zhang, Takuya Yoshioka, Naoyuki Kanda, Zhuo Chen, Xiaofei Wang, Dongmei Wang, Sefik Emre Eskimez

Recently, the all deep learning MVDR (ADL-MVDR) model was proposed for neural beamforming and demonstrated superior performance in a target speech extraction task using pre-segmented input.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

VarArray: Array-Geometry-Agnostic Continuous Speech Separation

no code implementations • 12 Oct 2021 • Takuya Yoshioka, Xiaofei Wang, Dongmei Wang, Min Tang, Zirun Zhu, Zhuo Chen, Naoyuki Kanda

Continuous speech separation using a microphone array was shown to be promising in dealing with the speech overlap problem in natural conversation transcription.

Speech Separation

Paper
Add Code

Continuous Speech Separation with Ad Hoc Microphone Arrays

no code implementations • 3 Mar 2021 • Dongmei Wang, Takuya Yoshioka, Zhuo Chen, Xiaofei Wang, Tianyan Zhou, Zhong Meng

Prior studies show, with a spatial-temporalinterleaving structure, neural networks can efficiently utilize the multi-channel signals of the ad hoc array.

speech-recognition Speech Recognition +1

Paper
Add Code

Neural Speech Separation Using Spatially Distributed Microphones

no code implementations • 28 Apr 2020 • Dongmei Wang, Zhuo Chen, Takuya Yoshioka

The inter-channel processing layers apply a self-attention mechanism along the channel dimension to exploit the information obtained with a varying number of microphones.

speech-recognition Speech Recognition +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.