Search Results for author: William Chan

Found 37 papers, 13 papers with code

Video Diffusion Models

no code implementations7 Apr 2022 Jonathan Ho, Tim Salimans, Alexey Gritsenko, William Chan, Mohammad Norouzi, David J. Fleet

Generating temporally coherent high fidelity video is an important milestone in generative modeling research.

Video Generation Video Prediction

Learning Fast Samplers for Diffusion Models by Differentiating Through Sample Quality

no code implementations11 Feb 2022 Daniel Watson, William Chan, Jonathan Ho, Mohammad Norouzi

We introduce Differentiable Diffusion Sampler Search (DDSS): a method that optimizes fast samplers for any pre-trained diffusion model by differentiating through sample quality scores.

Image Generation Unconditional Image Generation

Palette: Image-to-Image Diffusion Models

2 code implementations10 Nov 2021 Chitwan Saharia, William Chan, Huiwen Chang, Chris A. Lee, Jonathan Ho, Tim Salimans, David J. Fleet, Mohammad Norouzi

We expect this standardized evaluation protocol to play a role in advancing image-to-image translation research.

Colorization Denoising +5

Optimizing Few-Step Diffusion Samplers by Gradient Descent

no code implementations ICLR 2022 Daniel Watson, William Chan, Jonathan Ho, Mohammad Norouzi

We propose Generalized Gaussian Diffusion Processes (GGDP), a family of non-Markovian samplers for diffusion models, and we show how to improve the generated samples of pre-trained DDPMs by optimizing the degrees of freedom of the GGDP sampler family with respect to a perceptual loss.

Denoising Image Generation +1

WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis

2 code implementations17 Jun 2021 Nanxin Chen, Yu Zhang, Heiga Zen, Ron J. Weiss, Mohammad Norouzi, Najim Dehak, William Chan

The model takes an input phoneme sequence, and through an iterative refinement process, generates an audio waveform.

Speech Synthesis Text-To-Speech Synthesis

Learning to Efficiently Sample from Diffusion Probabilistic Models

no code implementations7 Jun 2021 Daniel Watson, Jonathan Ho, Mohammad Norouzi, William Chan

Key advantages of DDPMs include ease of training, in contrast to generative adversarial networks, and speed of generation, in contrast to autoregressive models.

Denoising Speech Synthesis

Cascaded Diffusion Models for High Fidelity Image Generation

no code implementations30 May 2021 Jonathan Ho, Chitwan Saharia, William Chan, David J. Fleet, Mohammad Norouzi, Tim Salimans

We show that cascaded diffusion models are capable of generating high fidelity images on the class-conditional ImageNet generation benchmark, without any assistance from auxiliary image classifiers to boost sample quality.

Data Augmentation Image Generation +1

Pushing the Limits of Non-Autoregressive Speech Recognition

no code implementations7 Apr 2021 Edwin G. Ng, Chung-Cheng Chiu, Yu Zhang, William Chan

We combine recent advancements in end-to-end speech recognition to non-autoregressive automatic speech recognition.

Automatic Speech Recognition

SpeechStew: Simply Mix All Available Speech Recognition Data to Train One Large Neural Network

no code implementations5 Apr 2021 William Chan, Daniel Park, Chris Lee, Yu Zhang, Quoc Le, Mohammad Norouzi

We present SpeechStew, a speech recognition model that is trained on a combination of various publicly available speech recognition datasets: AMI, Broadcast News, Common Voice, LibriSpeech, Switchboard/Fisher, Tedlium, and Wall Street Journal.

Speech Recognition Transfer Learning

WaveGrad: Estimating Gradients for Waveform Generation

6 code implementations ICLR 2021 Nanxin Chen, Yu Zhang, Heiga Zen, Ron J. Weiss, Mohammad Norouzi, William Chan

This paper introduces WaveGrad, a conditional model for waveform generation which estimates gradients of the data density.

Speech Synthesis Text-To-Speech Synthesis

Non-Autoregressive Machine Translation with Latent Alignments

2 code implementations EMNLP 2020 Chitwan Saharia, William Chan, Saurabh Saxena, Mohammad Norouzi

In addition, we adapt the Imputer model for non-autoregressive machine translation and demonstrate that Imputer with just 4 generation steps can match the performance of an autoregressive Transformer baseline.

Machine Translation Translation

Insertion-Deletion Transformer

no code implementations15 Jan 2020 Laura Ruis, Mitchell Stern, Julia Proskurnia, William Chan

We propose the Insertion-Deletion Transformer, a novel transformer-based neural architecture and training method for sequence generation.


SpecAugment on Large Scale Datasets

no code implementations11 Dec 2019 Daniel S. Park, Yu Zhang, Chung-Cheng Chiu, Youzheng Chen, Bo Li, William Chan, Quoc V. Le, Yonghui Wu

Recently, SpecAugment, an augmentation scheme for automatic speech recognition that acts directly on the spectrogram of input utterances, has shown to be highly effective in enhancing the performance of end-to-end networks on public datasets.

Automatic Speech Recognition

Big Bidirectional Insertion Representations for Documents

no code implementations WS 2019 Lala Li, William Chan

The Insertion Transformer is well suited for long form text generation due to its parallel generation capabilities, requiring $O(\log_2 n)$ generation steps to generate $n$ tokens.

Text Generation Translation

Multichannel Generative Language Models

no code implementations25 Sep 2019 Harris Chan, Jamie Kiros, William Chan

For conditional generation, the model is given a fully observed channel, and generates the k-1 channels in parallel.

Machine Translation

Speaker-Targeted Audio-Visual Models for Speech Recognition in Cocktail-Party Environments

no code implementations13 Jun 2019 Guan-Lin Chao, William Chan, Ian Lane

Speech recognition in cocktail-party environments remains a significant challenge for state-of-the-art speech recognition systems, as it is extremely difficult to extract an acoustic signal of an individual speaker from a background of overlapping speech with similar frequency and temporal characteristics.

Speech Recognition

KERMIT: Generative Insertion-Based Modeling for Sequences

no code implementations4 Jun 2019 William Chan, Nikita Kitaev, Kelvin Guu, Mitchell Stern, Jakob Uszkoreit

During training, one can feed KERMIT paired data $(x, y)$ to learn the joint distribution $p(x, y)$, and optionally mix in unpaired data $x$ or $y$ to refine the marginals $p(x)$ or $p(y)$.

Machine Translation Question Answering +2

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

29 code implementations18 Apr 2019 Daniel S. Park, William Chan, Yu Zhang, Chung-Cheng Chiu, Barret Zoph, Ekin D. Cubuk, Quoc V. Le

On LibriSpeech, we achieve 6. 8% WER on test-other without the use of a language model, and 5. 8% WER with shallow fusion with a language model.

Automatic Speech Recognition Data Augmentation

Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling

3 code implementations21 Feb 2019 Jonathan Shen, Patrick Nguyen, Yonghui Wu, Zhifeng Chen, Mia X. Chen, Ye Jia, Anjuli Kannan, Tara Sainath, Yuan Cao, Chung-Cheng Chiu, Yanzhang He, Jan Chorowski, Smit Hinsu, Stella Laurenzo, James Qin, Orhan Firat, Wolfgang Macherey, Suyog Gupta, Ankur Bapna, Shuyuan Zhang, Ruoming Pang, Ron J. Weiss, Rohit Prabhavalkar, Qiao Liang, Benoit Jacob, Bowen Liang, HyoukJoong Lee, Ciprian Chelba, Sébastien Jean, Bo Li, Melvin Johnson, Rohan Anil, Rajat Tibrewal, Xiaobing Liu, Akiko Eriguchi, Navdeep Jaitly, Naveen Ari, Colin Cherry, Parisa Haghani, Otavio Good, Youlong Cheng, Raziel Alvarez, Isaac Caswell, Wei-Ning Hsu, Zongheng Yang, Kuan-Chieh Wang, Ekaterina Gonina, Katrin Tomanek, Ben Vanik, Zelin Wu, Llion Jones, Mike Schuster, Yanping Huang, Dehao Chen, Kazuki Irie, George Foster, John Richardson, Klaus Macherey, Antoine Bruguier, Heiga Zen, Colin Raffel, Shankar Kumar, Kanishka Rao, David Rybach, Matthew Murray, Vijayaditya Peddinti, Maxim Krikun, Michiel A. U. Bacchiani, Thomas B. Jablin, Rob Suderman, Ian Williams, Benjamin Lee, Deepti Bhatia, Justin Carlson, Semih Yavuz, Yu Zhang, Ian McGraw, Max Galkin, Qi Ge, Golan Pundak, Chad Whipkey, Todd Wang, Uri Alon, Dmitry Lepikhin, Ye Tian, Sara Sabour, William Chan, Shubham Toshniwal, Baohua Liao, Michael Nirschl, Pat Rondon

Lingvo is a Tensorflow framework offering a complete solution for collaborative deep learning research, with a particular focus towards sequence-to-sequence models.

Sequence-To-Sequence Speech Recognition

Insertion Transformer: Flexible Sequence Generation via Insertion Operations

no code implementations8 Feb 2019 Mitchell Stern, William Chan, Jamie Kiros, Jakob Uszkoreit

We present the Insertion Transformer, an iterative, partially autoregressive model for sequence generation based on insertion operations.

Machine Translation Translation

Privacy Partitioning: Protecting User Data During the Deep Learning Inference Phase

no code implementations7 Dec 2018 Jianfeng Chi, Emmanuel Owusu, Xuwang Yin, Tong Yu, William Chan, Patrick Tague, Yuan Tian

We present a practical method for protecting data during the inference phase of deep learning based on bipartite topology threat modeling and an interactive adversarial deep network construction.

Face Identification General Classification

Bytes are All You Need: End-to-End Multilingual Speech Recognition and Synthesis with Bytes

no code implementations22 Nov 2018 Bo Li, Yu Zhang, Tara Sainath, Yonghui Wu, William Chan

We present two end-to-end models: Audio-to-Byte (A2B) and Byte-to-Audio (B2A), for multilingual speech recognition and synthesis.

Speech Recognition Speech Synthesis

Optimal Completion Distillation for Sequence Learning

2 code implementations ICLR 2019 Sara Sabour, William Chan, Mohammad Norouzi

We present Optimal Completion Distillation (OCD), a training procedure for optimizing sequence to sequence models based on edit distance.

Speech Recognition

Illustrative Language Understanding: Large-Scale Visual Grounding with Image Search

no code implementations ACL 2018 Jamie Kiros, William Chan, Geoffrey Hinton

We introduce Picturebook, a large-scale lookup operation to ground language via {`}snapshots{'} of our physical world accessed through image search.

General Classification Image Retrieval +7

Very Deep Convolutional Networks for End-to-End Speech Recognition

2 code implementations10 Oct 2016 Yu Zhang, William Chan, Navdeep Jaitly

Sequence-to-sequence models have shown success in end-to-end speech recognition.

Speech Recognition

Latent Sequence Decompositions

no code implementations10 Oct 2016 William Chan, Yu Zhang, Quoc Le, Navdeep Jaitly

We present the Latent Sequence Decompositions (LSD) framework.

Speech Recognition

Listen, Attend and Spell

40 code implementations5 Aug 2015 William Chan, Navdeep Jaitly, Quoc V. Le, Oriol Vinyals

Unlike traditional DNN-HMM models, this model learns all the components of a speech recognizer jointly.

Speech Recognition

Deep Recurrent Neural Networks for Acoustic Modelling

no code implementations7 Apr 2015 William Chan, Ian Lane

We present a novel deep Recurrent Neural Network (RNN) model for acoustic modelling in Automatic Speech Recognition (ASR).

Acoustic Modelling Automatic Speech Recognition

Transferring Knowledge from a RNN to a DNN

no code implementations7 Apr 2015 William Chan, Nan Rosemary Ke, Ian Lane

The small DNN trained on the soft RNN alignments achieved a 3. 93 WER on the Wall Street Journal (WSJ) eval92 task compared to a baseline 4. 54 WER or more than 13% relative improvement.

Automatic Speech Recognition

Cannot find the paper you are looking for? You can Submit a new open access paper.