Search Results for author: Dongmei Wang

Found 10 papers, 1 papers with code

CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations

no code implementations10 Apr 2024 Leying Zhang, Yao Qian, Long Zhou, Shujie Liu, Dongmei Wang, Xiaofei Wang, Midia Yousefi, Yanmin Qian, Jinyu Li, Lei He, Sheng Zhao, Michael Zeng

CoVoMix is capable of first converting dialogue text into multiple streams of discrete tokens, with each token stream representing semantic information for individual talkers.

Dialogue Generation

Adapting Multi-Lingual ASR Models for Handling Multiple Talkers

no code implementations30 May 2023 Chenda Li, Yao Qian, Zhuo Chen, Naoyuki Kanda, Dongmei Wang, Takuya Yoshioka, Yanmin Qian, Michael Zeng

State-of-the-art large-scale universal speech models (USMs) show a decent automatic speech recognition (ASR) performance across multiple domains and languages.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Target Sound Extraction with Variable Cross-modality Clues

1 code implementation15 Mar 2023 Chenda Li, Yao Qian, Zhuo Chen, Dongmei Wang, Takuya Yoshioka, Shujie Liu, Yanmin Qian, Michael Zeng

Automatic target sound extraction (TSE) is a machine learning approach to mimic the human auditory perception capability of attending to a sound source of interest from a mixture of sources.

AudioCaps Target Sound Extraction

Leveraging Real Conversational Data for Multi-Channel Continuous Speech Separation

no code implementations7 Apr 2022 Xiaofei Wang, Dongmei Wang, Naoyuki Kanda, Sefik Emre Eskimez, Takuya Yoshioka

In this paper, we propose a three-stage training scheme for the CSS model that can leverage both supervised data and extra large-scale unsupervised real-world conversational data.

Speech Separation

PickNet: Real-Time Channel Selection for Ad Hoc Microphone Arrays

no code implementations24 Jan 2022 Takuya Yoshioka, Xiaofei Wang, Dongmei Wang

Since PickNet utilizes only limited acoustic context at each time frame, the system using the proposed model works in real time and is robust to changes in acoustic conditions.

speech-recognition Speech Recognition

All-neural beamformer for continuous speech separation

no code implementations13 Oct 2021 Zhuohuang Zhang, Takuya Yoshioka, Naoyuki Kanda, Zhuo Chen, Xiaofei Wang, Dongmei Wang, Sefik Emre Eskimez

Recently, the all deep learning MVDR (ADL-MVDR) model was proposed for neural beamforming and demonstrated superior performance in a target speech extraction task using pre-segmented input.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

VarArray: Array-Geometry-Agnostic Continuous Speech Separation

no code implementations12 Oct 2021 Takuya Yoshioka, Xiaofei Wang, Dongmei Wang, Min Tang, Zirun Zhu, Zhuo Chen, Naoyuki Kanda

Continuous speech separation using a microphone array was shown to be promising in dealing with the speech overlap problem in natural conversation transcription.

Speech Separation

Continuous Speech Separation with Ad Hoc Microphone Arrays

no code implementations3 Mar 2021 Dongmei Wang, Takuya Yoshioka, Zhuo Chen, Xiaofei Wang, Tianyan Zhou, Zhong Meng

Prior studies show, with a spatial-temporalinterleaving structure, neural networks can efficiently utilize the multi-channel signals of the ad hoc array.

speech-recognition Speech Recognition +1

Neural Speech Separation Using Spatially Distributed Microphones

no code implementations28 Apr 2020 Dongmei Wang, Zhuo Chen, Takuya Yoshioka

The inter-channel processing layers apply a self-attention mechanism along the channel dimension to exploit the information obtained with a varying number of microphones.

speech-recognition Speech Recognition +1

Cannot find the paper you are looking for? You can Submit a new open access paper.