Search Results for author: Max W. Y. Lam

Found 10 papers, 4 papers with code

FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech Synthesis

2 code implementations • 21 Apr 2022 • Rongjie Huang, Max W. Y. Lam, Jun Wang, Dan Su, Dong Yu, Yi Ren, Zhou Zhao

Also, FastDiff enables a sampling speed of 58x faster than real-time on a V100 GPU, making diffusion models practically applicable to speech synthesis deployment for the first time.

Ranked #7 on Text-To-Speech Synthesis on LJSpeech (using extra training data)

Denoising Speech Synthesis +2

423

Paper
Code

BDDM: Bilateral Denoising Diffusion Models for Fast and High-Quality Speech Synthesis

1 code implementation • ICLR 2022 • Max W. Y. Lam, Jun Wang, Dan Su, Dong Yu

We propose a new bilateral denoising diffusion model (BDDM) that parameterizes both the forward and reverse processes with a schedule network and a score network, which can train with a novel bilateral modeling objective.

Ranked #1 on Speech Synthesis on LJSpeech

Image Generation Speech Synthesis

213

Paper
Code

SynCLR: A Synthesis Framework for Contrastive Learning of out-of-domain Speech Representations

no code implementations • 29 Sep 2021 • Rongjie Huang, Max W. Y. Lam, Jun Wang, Dan Su, Dong Yu, Zhou Zhao, Yi Ren

Learning generalizable speech representations for unseen samples in different domains has been a challenge with ever increasing importance to date.

Contrastive Learning Data Augmentation +4

Paper
Add Code

Bilateral Denoising Diffusion Models

no code implementations • 26 Aug 2021 • Max W. Y. Lam, Jun Wang, Rongjie Huang, Dan Su, Dong Yu

In this paper, we propose novel bilateral denoising diffusion models (BDDMs), which take significantly fewer steps to generate high-quality samples.

Denoising Scheduling

Paper
Add Code

Raw Waveform Encoder with Multi-Scale Globally Attentive Locally Recurrent Networks for End-to-End Speech Recognition

no code implementations • 8 Jun 2021 • Max W. Y. Lam, Jun Wang, Chao Weng, Dan Su, Dong Yu

End-to-end speech recognition generally uses hand-engineered acoustic features as input and excludes the feature extraction module from its joint optimization.

speech-recognition Speech Recognition

Paper
Add Code

Tune-In: Training Under Negative Environments with Interference for Attention Networks Simulating Cocktail Party Effect

no code implementations • 2 Mar 2021 • Jun Wang, Max W. Y. Lam, Dan Su, Dong Yu

We study the cocktail party problem and propose a novel attention network called Tune-In, abbreviated for training under negative environments with interference.

Speaker Verification Speech Separation

Paper
Add Code

Sandglasset: A Light Multi-Granularity Self-attentive Network For Time-Domain Speech Separation

2 code implementations • 1 Mar 2021 • Max W. Y. Lam, Jun Wang, Dan Su, Dong Yu

One of the leading single-channel speech separation (SS) models is based on a TasNet with a dual-path segmentation technique, where the size of each segment remains unchanged throughout all layers.

Ranked #8 on Speech Separation on WSJ0-3mix

Computational Efficiency Speech Separation

Paper
Code

Contrastive Separative Coding for Self-supervised Representation Learning

no code implementations • 1 Mar 2021 • Jun Wang, Max W. Y. Lam, Dan Su, Dong Yu

To extract robust deep representations from long sequential modeling of speech data, we propose a self-supervised learning approach, namely Contrastive Separative Coding (CSC).

Representation Learning Self-Supervised Learning +1

Paper
Add Code

Effective Low-Cost Time-Domain Audio Separation Using Globally Attentive Locally Recurrent Networks

2 code implementations • 13 Jan 2021 • Max W. Y. Lam, Jun Wang, Dan Su, Dong Yu

Recent research on the time-domain audio separation networks (TasNets) has brought great success to speech separation.

Ranked #14 on Speech Separation on WSJ0-2mix

Speech Separation

Paper
Code

Mixup-breakdown: a consistency training method for improving generalization of speech separation models

no code implementations • 28 Oct 2019 • Max W. Y. Lam, Jun Wang, Dan Su, Dong Yu

Deep-learning based speech separation models confront poor generalization problem that even the state-of-the-art models could abruptly fail when evaluating them in mismatch conditions.

Speech Separation

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.