Noise-robust Speech Separation with Fast Generative Correction

1 code implementation11 Jun 2024 Helin Wang, Jesus Villalba, Laureano Moro-Velazquez, Jiarui Hai, Thomas Thebaud, Najim Dehak

Speech separation, the task of isolating multiple speech sources from a mixed audio signal, remains challenging in noisy environments.

Speech Separation

Enhancing Zero-shot Text-to-Speech Synthesis with Human Feedback

no code implementations2 Jun 2024 Chen Chen, Yuchen Hu, Wen Wu, Helin Wang, Eng Siong Chng, Chao Zhang

In recent years, text-to-speech (TTS) technology has witnessed impressive advancements, particularly with large-scale training datasets, showcasing human-level speech quality and impressive zero-shot capabilities on unseen speakers.

Speech Synthesis Text-To-Speech Synthesis

Asynchronous and Segmented Bidirectional Encoding for NMT

no code implementations19 Feb 2024 Jingpu Yang, Zehua Han, Mengyu Xiang, Helin Wang, Yuxiao Huang, Miao Fang

With the rapid advancement of Neural Machine Translation (NMT), enhancing translation efficiency and quality has become a focal point of research.

Machine Translation NMT +2

Efficient Reinforcement Learning via Decoupling Exploration and Utilization

1 code implementation26 Dec 2023 Jingpu Yang, Helin Wang, Qirui Zhao, Zhecheng Shi, Zirui Song, Miao Fang

To address this, we have introduced an additional optimistic Actor to enhance the model's exploration ability, while employing a more constrained pessimistic Actor for performance evaluation.

Autonomous Vehicles reinforcement-learning +1

Improving fairness for spoken language understanding in atypical speech with Text-to-Speech

1 code implementation16 Nov 2023 Helin Wang, Venkatesh Ravichandran, Milind Rao, Becky Lammers, Myra Sydnor, Nicholas Maragakis, Ankur A. Butala, Jayne Zhang, Lora Clawson, Victoria Chovaz, Laureano Moro-Velazquez

Spoken language understanding (SLU) systems often exhibit suboptimal performance in processing atypical speech, typically caused by neurological conditions and motor impairments.

Data Augmentation Fairness +2

DPM-TSE: A Diffusion Probabilistic Model for Target Sound Extraction

1 code implementation6 Oct 2023 Jiarui Hai, Helin Wang, Dongchao Yang, Karan Thakkar, Najim Dehak, Mounya Elhilali

Common target sound extraction (TSE) approaches primarily relied on discriminative approaches in order to separate the target sound while minimizing interference from the unwanted sources, with varying success in separating the target from the background.

Target Sound Extraction

DuTa-VC: A Duration-aware Typical-to-atypical Voice Conversion Approach with Diffusion Probabilistic Model

1 code implementation18 Jun 2023 Helin Wang, Thomas Thebaud, Jesus Villalba, Myra Sydnor, Becky Lammers, Najim Dehak, Laureano Moro-Velazquez

We present a novel typical-to-atypical voice conversion approach (DuTa-VC), which (i) can be trained with nonparallel data (ii) first introduces diffusion probabilistic model (iii) preserves the target speaker identity (iv) is aware of the phoneme duration of the target speaker.

Data Augmentation Decoder +3

Diffsound: Discrete Diffusion Model for Text-to-sound Generation

1 code implementation20 Jul 2022 Dongchao Yang, Jianwei Yu, Helin Wang, Wen Wang, Chao Weng, Yuexian Zou, Dong Yu

In this study, we investigate generating sound conditioned on a text prompt and propose a novel text-to-sound generation framework that consists of a text encoder, a Vector Quantized Variational Autoencoder (VQ-VAE), a decoder, and a vocoder.

Audio Generation Decoder

Calibrate and Refine! A Novel and Agile Framework for ASR-error Robust Intent Detection

no code implementations23 May 2022 Peilin Zhou, Dading Chong, Helin Wang, Qingcheng Zeng

The past ten years have witnessed the rapid development of text-based intent detection, whose benchmark performances have already been taken to a remarkable level by deep learning techniques.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Audio-Oriented Multimodal Machine Comprehension: Task, Dataset and Model

no code implementations4 Jul 2021 Zhiqi Huang, Fenglin Liu, Xian Wu, Shen Ge, Helin Wang, Wei Fan, Yuexian Zou

As a result, the proposed approach can handle various tasks including: Audio-Oriented Multimodal Machine Comprehension, Machine Reading Comprehension and Machine Listening Comprehension, in a single model, making fair comparisons possible between our model and the existing unimodal MC models.

Knowledge Distillation Machine Reading Comprehension

Layer Reduction: Accelerating Conformer-Based Self-Supervised Model via Layer Consistency

no code implementations8 Apr 2021 Jinchuan Tian, Rongzhi Gu, Helin Wang, Yuexian Zou

Transformer-based self-supervised models are trained as feature extractors and have empowered many downstream speech tasks to achieve state-of-the-art performance.

speech-recognition Speech Recognition

SpecAugment++: A Hidden Space Data Augmentation Method for Acoustic Scene Classification

no code implementations31 Mar 2021 Helin Wang, Yuexian Zou, Wenwu Wang

In this paper, we present SpecAugment++, a novel data augmentation method for deep neural networks based acoustic scene classification (ASC).

Acoustic Scene Classification Data Augmentation +2

TeCANet: Temporal-Contextual Attention Network for Environment-Aware Speech Dereverberation

no code implementations31 Mar 2021 Helin Wang, Bo Wu, LianWu Chen, Meng Yu, Jianwei Yu, Yong Xu, Shi-Xiong Zhang, Chao Weng, Dan Su, Dong Yu

In this paper, we exploit the effective way to leverage contextual information to improve the speech dereverberation performance in real-world reverberant environments.

Room Impulse Response (RIR) Speech Dereverberation

Environmental Sound Classification with Parallel Temporal-spectral Attention

no code implementations14 Dec 2019 Helin Wang, Yuexian Zou, Dading Chong, Wenwu Wang

Convolutional neural networks (CNN) are one of the best-performing neural network architectures for environmental sound classification (ESC).

Acoustic Scene Classification Environmental Sound Classification +3

