Search Results for author: William Chan

Found 43 papers, 16 papers with code

Deep Recurrent Neural Networks for Acoustic Modelling

no code implementations • 7 Apr 2015 • William Chan, Ian Lane

We present a novel deep Recurrent Neural Network (RNN) model for acoustic modelling in Automatic Speech Recognition (ASR).

Ranked #10 on Speech Recognition on WSJ eval92

Acoustic Modelling Automatic Speech Recognition +2

Paper
Add Code

Transferring Knowledge from a RNN to a DNN

no code implementations • 7 Apr 2015 • William Chan, Nan Rosemary Ke, Ian Lane

The small DNN trained on the soft RNN alignments achieved a 3. 93 WER on the Wall Street Journal (WSJ) eval92 task compared to a baseline 4. 54 WER or more than 13% relative improvement.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Listen, Attend and Spell

40 code implementations • 5 Aug 2015 • William Chan, Navdeep Jaitly, Quoc V. Le, Oriol Vinyals

Unlike traditional DNN-HMM models, this model learns all the components of a speech recognizer jointly.

Language Modelling Reading Comprehension +1

1,156

Paper
Code

Latent Sequence Decompositions

no code implementations • 10 Oct 2016 • William Chan, Yu Zhang, Quoc Le, Navdeep Jaitly

We present the Latent Sequence Decompositions (LSD) framework.

speech-recognition Speech Recognition +1

Paper
Add Code

Very Deep Convolutional Networks for End-to-End Speech Recognition

2 code implementations • 10 Oct 2016 • Yu Zhang, William Chan, Navdeep Jaitly

Sequence-to-sequence models have shown success in end-to-end speech recognition.

speech-recognition Speech Recognition

Paper
Code

Advances in Joint CTC-Attention based End-to-End Speech Recognition with a Deep CNN Encoder and RNN-LM

6 code implementations • 8 Jun 2017 • Takaaki Hori, Shinji Watanabe, Yu Zhang, William Chan

The CTC network sits on top of the encoder and is jointly trained with the attention-based decoder.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

1,156

Paper
Code

Illustrative Language Understanding: Large-Scale Visual Grounding with Image Search

no code implementations • ACL 2018 • Jamie Kiros, William Chan, Geoffrey Hinton

We introduce Picturebook, a large-scale lookup operation to ground language via {`}snapshots{'} of our physical world accessed through image search.

General Classification Image Retrieval +8

Paper
Add Code

InferLite: Simple Universal Sentence Representations from Natural Language Inference Data

no code implementations • EMNLP 2018 • Jamie Kiros, William Chan

Natural language inference has been shown to be an effective supervised task for learning generic sentence embeddings.

Natural Language Inference Position +3

Paper
Add Code

Optimal Completion Distillation for Sequence Learning

2 code implementations • ICLR 2019 • Sara Sabour, William Chan, Mohammad Norouzi

We present Optimal Completion Distillation (OCD), a training procedure for optimizing sequence to sequence models based on edit distance.

Position speech-recognition +1

Paper
Code

Bytes are All You Need: End-to-End Multilingual Speech Recognition and Synthesis with Bytes

no code implementations • 22 Nov 2018 • Bo Li, Yu Zhang, Tara Sainath, Yonghui Wu, William Chan

We present two end-to-end models: Audio-to-Byte (A2B) and Byte-to-Audio (B2A), for multilingual speech recognition and synthesis.

speech-recognition Speech Recognition +1

Paper
Add Code

Privacy Partitioning: Protecting User Data During the Deep Learning Inference Phase

no code implementations • 7 Dec 2018 • Jianfeng Chi, Emmanuel Owusu, Xuwang Yin, Tong Yu, William Chan, Patrick Tague, Yuan Tian

We present a practical method for protecting data during the inference phase of deep learning based on bipartite topology threat modeling and an interactive adversarial deep network construction.

BIG-bench Machine Learning Face Identification +1

Paper
Add Code

Insertion Transformer: Flexible Sequence Generation via Insertion Operations

no code implementations • 8 Feb 2019 • Mitchell Stern, William Chan, Jamie Kiros, Jakob Uszkoreit

We present the Insertion Transformer, an iterative, partially autoregressive model for sequence generation based on insertion operations.

Machine Translation Translation +1

Paper
Add Code

Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling

2 code implementations • 21 Feb 2019 • Jonathan Shen, Patrick Nguyen, Yonghui Wu, Zhifeng Chen, Mia X. Chen, Ye Jia, Anjuli Kannan, Tara Sainath, Yuan Cao, Chung-Cheng Chiu, Yanzhang He, Jan Chorowski, Smit Hinsu, Stella Laurenzo, James Qin, Orhan Firat, Wolfgang Macherey, Suyog Gupta, Ankur Bapna, Shuyuan Zhang, Ruoming Pang, Ron J. Weiss, Rohit Prabhavalkar, Qiao Liang, Benoit Jacob, Bowen Liang, HyoukJoong Lee, Ciprian Chelba, Sébastien Jean, Bo Li, Melvin Johnson, Rohan Anil, Rajat Tibrewal, Xiaobing Liu, Akiko Eriguchi, Navdeep Jaitly, Naveen Ari, Colin Cherry, Parisa Haghani, Otavio Good, Youlong Cheng, Raziel Alvarez, Isaac Caswell, Wei-Ning Hsu, Zongheng Yang, Kuan-Chieh Wang, Ekaterina Gonina, Katrin Tomanek, Ben Vanik, Zelin Wu, Llion Jones, Mike Schuster, Yanping Huang, Dehao Chen, Kazuki Irie, George Foster, John Richardson, Klaus Macherey, Antoine Bruguier, Heiga Zen, Colin Raffel, Shankar Kumar, Kanishka Rao, David Rybach, Matthew Murray, Vijayaditya Peddinti, Maxim Krikun, Michiel A. U. Bacchiani, Thomas B. Jablin, Rob Suderman, Ian Williams, Benjamin Lee, Deepti Bhatia, Justin Carlson, Semih Yavuz, Yu Zhang, Ian McGraw, Max Galkin, Qi Ge, Golan Pundak, Chad Whipkey, Todd Wang, Uri Alon, Dmitry Lepikhin, Ye Tian, Sara Sabour, William Chan, Shubham Toshniwal, Baohua Liao, Michael Nirschl, Pat Rondon

Lingvo is a Tensorflow framework offering a complete solution for collaborative deep learning research, with a particular focus towards sequence-to-sequence models.

Sequence-To-Sequence Speech Recognition

2,781

Paper
Code

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

29 code implementations • 18 Apr 2019 • Daniel S. Park, William Chan, Yu Zhang, Chung-Cheng Chiu, Barret Zoph, Ekin D. Cubuk, Quoc V. Le

On LibriSpeech, we achieve 6. 8% WER on test-other without the use of a language model, and 5. 8% WER with shallow fusion with a language model.

Ranked #1 on Speech Recognition on Hub5'00 SwitchBoard

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

24,185

Paper
Code

KERMIT: Generative Insertion-Based Modeling for Sequences

no code implementations • 4 Jun 2019 • William Chan, Nikita Kitaev, Kelvin Guu, Mitchell Stern, Jakob Uszkoreit

During training, one can feed KERMIT paired data $(x, y)$ to learn the joint distribution $p(x, y)$, and optionally mix in unpaired data $x$ or $y$ to refine the marginals $p(x)$ or $p(y)$.

Ranked #39 on Machine Translation on WMT2014 English-German

Machine Translation Question Answering +2

Paper
Add Code

Speaker-Targeted Audio-Visual Models for Speech Recognition in Cocktail-Party Environments

no code implementations • 13 Jun 2019 • Guan-Lin Chao, William Chan, Ian Lane

Speech recognition in cocktail-party environments remains a significant challenge for state-of-the-art speech recognition systems, as it is extremely difficult to extract an acoustic signal of an individual speaker from a background of overlapping speech with similar frequency and temporal characteristics.

speech-recognition Speech Recognition

Paper
Add Code

Multichannel Generative Language Models

no code implementations • 25 Sep 2019 • Harris Chan, Jamie Kiros, William Chan

For conditional generation, the model is given a fully observed channel, and generates the k-1 channels in parallel.

Machine Translation

Paper
Add Code

Big Bidirectional Insertion Representations for Documents

no code implementations • WS 2019 • Lala Li, William Chan

The Insertion Transformer is well suited for long form text generation due to its parallel generation capabilities, requiring $O(\log_2 n)$ generation steps to generate $n$ tokens.

Sentence Text Generation +1

Paper
Add Code

An Empirical Study of Generation Order for Machine Translation

no code implementations • EMNLP 2020 • William Chan, Mitchell Stern, Jamie Kiros, Jakob Uszkoreit

In this work, we present an empirical study of generation order for machine translation.

Machine Translation Translation

Paper
Add Code

SpecAugment on Large Scale Datasets

no code implementations • 11 Dec 2019 • Daniel S. Park, Yu Zhang, Chung-Cheng Chiu, Youzheng Chen, Bo Li, William Chan, Quoc V. Le, Yonghui Wu

Recently, SpecAugment, an augmentation scheme for automatic speech recognition that acts directly on the spectrogram of input utterances, has shown to be highly effective in enhancing the performance of end-to-end networks on public datasets.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Insertion-Deletion Transformer

no code implementations • 15 Jan 2020 • Laura Ruis, Mitchell Stern, Julia Proskurnia, William Chan

We propose the Insertion-Deletion Transformer, a novel transformer-based neural architecture and training method for sequence generation.

Translation

Paper
Add Code

Imputer: Sequence Modelling via Imputation and Dynamic Programming

1 code implementation • ICML 2020 • William Chan, Chitwan Saharia, Geoffrey Hinton, Mohammad Norouzi, Navdeep Jaitly

This paper presents the Imputer, a neural sequence model that generates output sequences iteratively via imputations.

Imputation speech-recognition +1

Paper
Code

Non-Autoregressive Machine Translation with Latent Alignments

2 code implementations • EMNLP 2020 • Chitwan Saharia, William Chan, Saurabh Saxena, Mohammad Norouzi

In addition, we adapt the Imputer model for non-autoregressive machine translation and demonstrate that Imputer with just 4 generation steps can match the performance of an autoregressive Transformer baseline.

Machine Translation Translation

Paper
Code

WaveGrad: Estimating Gradients for Waveform Generation

7 code implementations • ICLR 2021 • Nanxin Chen, Yu Zhang, Heiga Zen, Ron J. Weiss, Mohammad Norouzi, William Chan

This paper introduces WaveGrad, a conditional model for waveform generation which estimates gradients of the data density.

Speech Synthesis Text-To-Speech Synthesis

28,889

Paper
Code

Multichannel Generative Language Model: Learning All Possible Factorizations Within and Across Channels

no code implementations • Findings of the Association for Computational Linguistics 2020 • Harris Chan, Jamie Kiros, William Chan

MGLM is a generative joint distribution model over channels.

Language Modelling

Paper
Add Code

SpeechStew: Simply Mix All Available Speech Recognition Data to Train One Large Neural Network

no code implementations • 5 Apr 2021 • William Chan, Daniel Park, Chris Lee, Yu Zhang, Quoc Le, Mohammad Norouzi

We present SpeechStew, a speech recognition model that is trained on a combination of various publicly available speech recognition datasets: AMI, Broadcast News, Common Voice, LibriSpeech, Switchboard/Fisher, Tedlium, and Wall Street Journal.

Ranked #1 on Speech Recognition on Switchboard CallHome

Language Modelling speech-recognition +2

Paper
Add Code

Pushing the Limits of Non-Autoregressive Speech Recognition

no code implementations • 7 Apr 2021 • Edwin G. Ng, Chung-Cheng Chiu, Yu Zhang, William Chan

We combine recent advancements in end-to-end speech recognition to non-autoregressive automatic speech recognition.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Image Super-Resolution via Iterative Refinement

4 code implementations • 15 Apr 2021 • Chitwan Saharia, Jonathan Ho, William Chan, Tim Salimans, David J. Fleet, Mohammad Norouzi

We present SR3, an approach to image Super-Resolution via Repeated Refinement.

Conditional Image Generation Denoising +1

3,339

Paper
Code

Cascaded Diffusion Models for High Fidelity Image Generation

no code implementations • 30 May 2021 • Jonathan Ho, Chitwan Saharia, William Chan, David J. Fleet, Mohammad Norouzi, Tim Salimans

We show that cascaded diffusion models are capable of generating high fidelity images on the class-conditional ImageNet generation benchmark, without any assistance from auxiliary image classifiers to boost sample quality.

Ranked #3 on Image Generation on ImageNet 64x64

Data Augmentation Image Generation +2

Paper
Add Code

Learning to Efficiently Sample from Diffusion Probabilistic Models

no code implementations • 7 Jun 2021 • Daniel Watson, Jonathan Ho, Mohammad Norouzi, William Chan

Key advantages of DDPMs include ease of training, in contrast to generative adversarial networks, and speed of generation, in contrast to autoregressive models.

Denoising Speech Synthesis

Paper
Add Code

WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis

3 code implementations • 17 Jun 2021 • Nanxin Chen, Yu Zhang, Heiga Zen, Ron J. Weiss, Mohammad Norouzi, Najim Dehak, William Chan

The model takes an input phoneme sequence, and through an iterative refinement process, generates an audio waveform.

Speech Synthesis Text-To-Speech Synthesis

110

Paper
Code

BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for Automatic Speech Recognition

no code implementations • 27 Sep 2021 • Yu Zhang, Daniel S. Park, Wei Han, James Qin, Anmol Gulati, Joel Shor, Aren Jansen, Yuanzhong Xu, Yanping Huang, Shibo Wang, Zongwei Zhou, Bo Li, Min Ma, William Chan, Jiahui Yu, Yongqiang Wang, Liangliang Cao, Khe Chai Sim, Bhuvana Ramabhadran, Tara N. Sainath, Françoise Beaufays, Zhifeng Chen, Quoc V. Le, Chung-Cheng Chiu, Ruoming Pang, Yonghui Wu

We summarize the results of a host of efforts using giant automatic speech recognition (ASR) models pre-trained using large, diverse unlabeled datasets containing approximately a million hours of audio.

Ranked #1 on Speech Recognition on Common Voice

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Optimizing Few-Step Diffusion Samplers by Gradient Descent

no code implementations • ICLR 2022 • Daniel Watson, William Chan, Jonathan Ho, Mohammad Norouzi

We propose Generalized Gaussian Diffusion Processes (GGDP), a family of non-Markovian samplers for diffusion models, and we show how to improve the generated samples of pre-trained DDPMs by optimizing the degrees of freedom of the GGDP sampler family with respect to a perceptual loss.

Denoising Image Generation +1

Paper
Add Code

Palette: Image-to-Image Diffusion Models

4 code implementations • 10 Nov 2021 • Chitwan Saharia, William Chan, Huiwen Chang, Chris A. Lee, Jonathan Ho, Tim Salimans, David J. Fleet, Mohammad Norouzi

We expect this standardized evaluation protocol to play a role in advancing image-to-image translation research.

Ranked #1 on Colorization on ImageNet ctest10k

Colorization Denoising +5

1,369

Paper
Code

Learning Fast Samplers for Diffusion Models by Differentiating Through Sample Quality

no code implementations • 11 Feb 2022 • Daniel Watson, William Chan, Jonathan Ho, Mohammad Norouzi

We introduce Differentiable Diffusion Sampler Search (DDSS): a method that optimizes fast samplers for any pre-trained diffusion model by differentiating through sample quality scores.

Image Generation Unconditional Image Generation

Paper
Add Code

Video Diffusion Models

3 code implementations • 7 Apr 2022 • Jonathan Ho, Tim Salimans, Alexey Gritsenko, William Chan, Mohammad Norouzi, David J. Fleet

Generating temporally coherent high fidelity video is an important milestone in generative modeling research.

Ranked #1 on Video Generation on UCF-101 16 frames, 64x64, Unconditional

Unconditional Video Generation Video Prediction

1,834

Paper
Code

Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding

4 code implementations • 23 May 2022 • Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily Denton, Seyed Kamyar Seyed Ghasemipour, Burcu Karagol Ayan, S. Sara Mahdavi, Rapha Gontijo Lopes, Tim Salimans, Jonathan Ho, David J Fleet, Mohammad Norouzi

We present Imagen, a text-to-image diffusion model with an unprecedented degree of photorealism and a deep level of language understanding.

Ranked #17 on Text-to-Image Generation on MS COCO (using extra training data)

7,756

Paper
Code

Imagen Video: High Definition Video Generation with Diffusion Models

no code implementations • 5 Oct 2022 • Jonathan Ho, William Chan, Chitwan Saharia, Jay Whang, Ruiqi Gao, Alexey Gritsenko, Diederik P. Kingma, Ben Poole, Mohammad Norouzi, David J. Fleet, Tim Salimans

We present Imagen Video, a text-conditional video generation system based on a cascade of video diffusion models.

Ranked #1 on Video Generation on LAION-400M

Image Generation Video Generation +3

Paper
Add Code

Novel View Synthesis with Diffusion Models

no code implementations • 6 Oct 2022 • Daniel Watson, William Chan, Ricardo Martin-Brualla, Jonathan Ho, Andrea Tagliasacchi, Mohammad Norouzi

We demonstrate that stochastic conditioning significantly improves the 3D consistency of a naive sampler for an image-to-image diffusion model, which involves conditioning on a single fixed view.

Denoising Novel View Synthesis

Paper
Add Code

Imagen Editor and EditBench: Advancing and Evaluating Text-Guided Image Inpainting

no code implementations • CVPR 2023 • Su Wang, Chitwan Saharia, Ceslee Montgomery, Jordi Pont-Tuset, Shai Noy, Stefano Pellegrini, Yasumasa Onoe, Sarah Laszlo, David J. Fleet, Radu Soricut, Jason Baldridge, Mohammad Norouzi, Peter Anderson, William Chan

Through extensive human evaluation on EditBench, we find that object-masking during training leads to across-the-board improvements in text-image alignment -- such that Imagen Editor is preferred over DALL-E 2 and Stable Diffusion -- and, as a cohort, these models are better at object-rendering than text-rendering, and handle material/color/size attributes better than count/shape attributes.

Image Inpainting Object +1

Paper
Add Code

Character-Aware Models Improve Visual Text Rendering

1 code implementation • 20 Dec 2022 • Rosanne Liu, Dan Garrette, Chitwan Saharia, William Chan, Adam Roberts, Sharan Narang, Irina Blok, RJ Mical, Mohammad Norouzi, Noah Constant

In the text-only domain, we find that character-aware models provide large gains on a novel spelling task (WikiSpell).

Image Generation

167

Paper
Code

Noise2Music: Text-conditioned Music Generation with Diffusion Models

no code implementations • 8 Feb 2023 • Qingqing Huang, Daniel S. Park, Tao Wang, Timo I. Denk, Andy Ly, Nanxin Chen, Zhengdong Zhang, Zhishuai Zhang, Jiahui Yu, Christian Frank, Jesse Engel, Quoc V. Le, William Chan, Zhifeng Chen, Wei Han

We introduce Noise2Music, where a series of diffusion models is trained to generate high-quality 30-second music clips from text prompts.

Ranked #2 on Text-to-Music Generation on MusicCaps

Music Generation Text-to-Music Generation

Paper
Add Code

TryOnDiffusion: A Tale of Two UNets

1 code implementation • CVPR 2023 • Luyang Zhu, Dawei Yang, Tyler Zhu, Fitsum Reda, William Chan, Chitwan Saharia, Mohammad Norouzi, Ira Kemelmacher-Shlizerman

Given two images depicting a person and a garment worn by another person, our goal is to generate a visualization of how the garment might look on the input person.

Virtual Try-on

109

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.