no code implementations • 31 Oct 2023 • Mingjie Liu, Teodor-Dumitru Ene, Robert Kirby, Chris Cheng, Nathaniel Pinckney, Rongjian Liang, Jonah Alben, Himyanshu Anand, Sanmitra Banerjee, Ismet Bayraktaroglu, Bonita Bhaskaran, Bryan Catanzaro, Arjun Chaudhuri, Sharon Clay, Bill Dally, Laura Dang, Parikshit Deshpande, Siddhanth Dhodhi, Sameer Halepete, Eric Hill, Jiashang Hu, Sumit Jain, Brucek Khailany, George Kokai, Kishor Kunal, Xiaowei Li, Charley Lind, Hao liu, Stuart Oberman, Sujeet Omar, Sreedhar Pratty, Jonathan Raiman, Ambar Sarkar, Zhengjiang Shao, Hanfei Sun, Pratik P Suthar, Varun Tej, Walker Turner, Kaizhe Xu, Haoxing Ren
ChipNeMo aims to explore the applications of large language models (LLMs) for industrial chip design.
no code implementations • 11 Oct 2023 • Boxin Wang, Wei Ping, Lawrence McAfee, Peng Xu, Bo Li, Mohammad Shoeybi, Bryan Catanzaro
Our results highlights the promising direction to obtain a better GPT decoder for QA through continued pretraining with retrieval before instruction tuning.
no code implementations • 4 Oct 2023 • Peng Xu, Wei Ping, Xianchao Wu, Lawrence McAfee, Chen Zhu, Zihan Liu, Sandeep Subramanian, Evelina Bakhturina, Mohammad Shoeybi, Bryan Catanzaro
Perhaps surprisingly, we find that LLM with 4K context window using simple retrieval-augmentation at generation can achieve comparable performance to finetuned LLM with 16K context window via positional interpolation on long context tasks, while taking much less computation.
1 code implementation • NeurIPS 2023 • Sungwon Kim ~Sungwon_Kim2, Kevin J. Shih, Rohan Badlani, Joao Felipe Santos, Evelina Bakhturina, Mikyas T. Desta, Rafael Valle, Sungroh Yoon, Bryan Catanzaro
P-Flow comprises a speech-prompted text encoder for speaker adaptation and a flow matching generative decoder for high-quality and fast speech synthesis.
no code implementations • 12 Sep 2023 • Zhifeng Kong, Wei Ping, Ambrish Dantrey, Bryan Catanzaro
In this work, we present CleanUNet 2, a speech denoising model that combines the advantages of waveform denoiser and spectrogram denoiser and achieves the best of both worlds.
no code implementations • 15 Aug 2023 • Jie Huang, Wei Ping, Peng Xu, Mohammad Shoeybi, Kevin Chen-Chuan Chang, Bryan Catanzaro
In this paper, we investigate the in-context learning ability of retrieval-augmented encoder-decoder language models.
no code implementations • 7 Aug 2023 • Ahmed Agiza, Rajarshi Roy, Teodor Dumitru Ene, Saad Godil, Sherief Reda, Bryan Catanzaro
Given a gate-level netlist of a circuit represented as a graph, GraPhSyM utilizes graph structure, connectivity, and electrical property features to predict the impact of physical synthesis transformations such as buffer insertion and gate sizing.
no code implementations • 18 May 2023 • Aysegul Dundar, Jun Gao, Andrew Tao, Bryan Catanzaro
In this work, to overcome these limitations of generated datasets, we have two main contributions which lead us to achieve state-of-the-art results on challenging objects: 1) A robust multi-stage learning scheme that gradually relies more on the models own predictions when calculating losses, 2) A novel adversarial learning pipeline with online pseudo-ground truth generations to achieve fine details.
no code implementations • ICCV 2023 • Songwei Ge, Seungjun Nah, Guilin Liu, Tyler Poon, Andrew Tao, Bryan Catanzaro, David Jacobs, Jia-Bin Huang, Ming-Yu Liu, Yogesh Balaji
Despite tremendous progress in generating high-quality images using diffusion models, synthesizing a sequence of animated frames that are both photorealistic and temporally coherent is still in its infancy.
Ranked #3 on
Text-to-Video Generation
on UCF-101
1 code implementation • 13 Apr 2023 • Boxin Wang, Wei Ping, Peng Xu, Lawrence McAfee, Zihan Liu, Mohammad Shoeybi, Yi Dong, Oleksii Kuchaiev, Bo Li, Chaowei Xiao, Anima Anandkumar, Bryan Catanzaro
To answer it, we perform a comprehensive study on a scalable pre-trained retrieval-augmented LM (i. e., RETRO) compared with standard GPT and retrieval-augmented GPT incorporated at fine-tuning or inference stages.
no code implementations • 14 Mar 2023 • Rohan Badlani, Akshit Arora, Subhankar Ghosh, Rafael Valle, Kevin J. Shih, João Felipe Santos, Boris Ginsburg, Bryan Catanzaro
We introduce VANI, a very lightweight multi-lingual accent controllable speech synthesis system.
no code implementations • 14 Feb 2023 • Shrimai Prabhumoye, Mostofa Patwary, Mohammad Shoeybi, Bryan Catanzaro
Pretrained large language models have become indispensable for solving various natural language processing (NLP) tasks.
no code implementations • 9 Feb 2023 • Zhuolin Yang, Wei Ping, Zihan Liu, Vijay Korthikanti, Weili Nie, De-An Huang, Linxi Fan, Zhiding Yu, Shiyi Lan, Bo Li, Ming-Yu Liu, Yuke Zhu, Mohammad Shoeybi, Bryan Catanzaro, Chaowei Xiao, Anima Anandkumar
Augmenting pretrained language models (LMs) with a vision encoder (e. g., Flamingo) has obtained the state-of-the-art results in image-to-text generation.
no code implementations • 24 Jan 2023 • Rohan Badlani, Rafael Valle, Kevin J. Shih, João Felipe Santos, Siddharth Gururani, Bryan Catanzaro
We work to create a multilingual speech synthesis system which can generate speech with the proper accent while retaining the characteristics of an individual voice.
2 code implementations • 2 Nov 2022 • Yogesh Balaji, Seungjun Nah, Xun Huang, Arash Vahdat, Jiaming Song, Qinsheng Zhang, Karsten Kreis, Miika Aittala, Timo Aila, Samuli Laine, Bryan Catanzaro, Tero Karras, Ming-Yu Liu
Therefore, in contrast to existing works, we propose to train an ensemble of text-to-image diffusion models specialized for different synthesis stages.
Ranked #13 on
Text-to-Image Generation
on COCO
no code implementations • 25 Oct 2022 • Peng Xu, Mostofa Patwary, Shrimai Prabhumoye, Virginia Adams, Ryan J. Prenger, Wei Ping, Nayeon Lee, Mohammad Shoeybi, Bryan Catanzaro
For cross-domain and cross-dataset cases, we show that (a) Adapter (Houlsby et al., 2019) performs the best amongst all the PERMs studied here, and (b) it outperforms finetuning if the task dataset is below a certain size.
no code implementations • 12 Oct 2022 • Dan Su, Mostofa Patwary, Shrimai Prabhumoye, Peng Xu, Ryan Prenger, Mohammad Shoeybi, Pascale Fung, Anima Anandkumar, Bryan Catanzaro
Prior work on closed-book QA either directly finetunes or prompts a pretrained language model (LM) to leverage the stored knowledge.
3 code implementations • 9 Jun 2022 • Nayeon Lee, Wei Ping, Peng Xu, Mostofa Patwary, Pascale Fung, Mohammad Shoeybi, Bryan Catanzaro
In this work, we measure and improve the factual accuracy of large-scale LMs for open-ended text generation.
3 code implementations • 9 Jun 2022 • Sang-gil Lee, Wei Ping, Boris Ginsburg, Bryan Catanzaro, Sungroh Yoon
Despite recent progress in generative adversarial network (GAN)-based vocoders, where the model generates raw waveform conditioned on acoustic features, it is challenging to synthesize high-fidelity audio for numerous speakers across various recording environments.
Ranked #4 on
Speech Synthesis
on LibriTTS
no code implementations • 14 May 2022 • Rajarshi Roy, Jonathan Raiman, Neel Kant, Ilyas Elkin, Robert Kirby, Michael Siu, Stuart Oberman, Saad Godil, Bryan Catanzaro
Deep Convolutional RL agents trained on this environment produce prefix adder circuits that Pareto-dominate existing baselines with up to 16. 0% and 30. 2% lower area for the same delay in the 32b and 64b settings respectively.
2 code implementations • 10 May 2022 • Vijay Korthikanti, Jared Casper, Sangkug Lym, Lawrence McAfee, Michael Andersch, Mohammad Shoeybi, Bryan Catanzaro
In this paper, we show how to significantly accelerate training of large transformer models by reducing activation recomputation.
no code implementations • 17 Mar 2022 • Aysegul Dundar, Jun Gao, Andrew Tao, Bryan Catanzaro
The reconstruction is posed as an adaptation problem and is done progressively where in the first stage, we focus on learning accurate geometry, whereas in the second stage, we focus on learning the texture with a generative adversarial network.
1 code implementation • Findings (ACL) 2022 • Zihan Liu, Mostofa Patwary, Ryan Prenger, Shrimai Prabhumoye, Wei Ping, Mohammad Shoeybi, Bryan Catanzaro
We propose a multi-stage prompting approach to generate knowledgeable responses from a single pretrained LM.
1 code implementation • 3 Mar 2022 • Kevin J. Shih, Rafael Valle, Rohan Badlani, João Felipe Santos, Bryan Catanzaro
Despite recent advances in generative modeling for text-to-speech synthesis, these models do not yet have the same fine-grained adjustability of pitch-conditioned deterministic models such as FastPitch and FastSpeech2.
1 code implementation • 15 Feb 2022 • Zhifeng Kong, Wei Ping, Ambrish Dantrey, Bryan Catanzaro
In this work, we present CleanUNet, a causal speech denoising model on the raw waveform.
1 code implementation • 8 Feb 2022 • Boxin Wang, Wei Ping, Chaowei Xiao, Peng Xu, Mostofa Patwary, Mohammad Shoeybi, Bo Li, Anima Anandkumar, Bryan Catanzaro
In this work, we systematically explore domain-adaptive training to reduce the toxicity of language models.
no code implementations • 31 Jan 2022 • Max Ehrlich, Jon Barker, Namitha Padmanabhan, Larry Davis, Andrew Tao, Bryan Catanzaro, Abhinav Shrivastava
Video compression is a central feature of the modern internet powering technologies from social media to video conferencing.
1 code implementation • 28 Jan 2022 • Shaden Smith, Mostofa Patwary, Brandon Norick, Patrick Legresley, Samyam Rajbhandari, Jared Casper, Zhun Liu, Shrimai Prabhumoye, George Zerveas, Vijay Korthikanti, Elton Zhang, Rewon Child, Reza Yazdani Aminabadi, Julie Bernauer, Xia Song, Mohammad Shoeybi, Yuxiong He, Michael Houston, Saurabh Tiwary, Bryan Catanzaro
Next, we detail the training process, the design of our training corpus, and our data curation techniques, which we believe is a key ingredient to the success of the model.
Ranked #2 on
Language Modelling
on LAMBADA
no code implementations • 15 Dec 2021 • Shrimai Prabhumoye, Rafal Kocielnik, Mohammad Shoeybi, Anima Anandkumar, Bryan Catanzaro
We then provide the LM with instruction that consists of this subset of labeled exemplars, the query text to be classified, a definition of bias, and prompt it to make a decision.
1 code implementation • 24 Nov 2021 • John Guibas, Morteza Mardani, Zongyi Li, Andrew Tao, Anima Anandkumar, Bryan Catanzaro
AFNO is based on a principled foundation of operator learning which allows us to frame token mixing as a continuous global convolution without any dependence on the input resolution.
no code implementations • ICLR 2022 • John Guibas, Morteza Mardani, Zongyi Li, Andrew Tao, Anima Anandkumar, Bryan Catanzaro
AFNO is based on a principled foundation of operator learning which allows us to frame token mixing as a continuous global convolution without any dependence on the input resolution.
no code implementations • 6 Sep 2021 • Robert Kirby, Kolby Nottingham, Rajarshi Roy, Saad Godil, Bryan Catanzaro
In this work we augment state-of-the-art, force-based global placement solvers with a reinforcement learning agent trained to improve the final detail placed Half Perimeter Wire Length (HPWL).
3 code implementations • 23 Aug 2021 • Rohan Badlani, Adrian Łancucki, Kevin J. Shih, Rafael Valle, Wei Ping, Bryan Catanzaro
However, these alignments tend to be brittle and often fail to generalize to long utterances and out-of-domain text, leading to missing or repeating words.
3 code implementations • NeurIPS 2021 • Chen Zhu, Wei Ping, Chaowei Xiao, Mohammad Shoeybi, Tom Goldstein, Anima Anandkumar, Bryan Catanzaro
For instance, Transformer-LS achieves 0. 97 test BPC on enwik8 using half the number of parameters than previous method, while being faster and is able to handle 3x as long sequences compared to its full-attention version on the same hardware.
Ranked #1 on
Language Modelling
on enwik8 dev
no code implementations • CVPR 2021 • Anand Bhattad, Aysegul Dundar, Guilin Liu, Andrew Tao, Bryan Catanzaro
We describe a cycle consistency loss that encourages model textures to be aligned, so as to encourage sharing.
1 code implementation • ICML Workshop INNF 2021 • Kevin J. Shih, Rafael Valle, Rohan Badlani, Adrian Lancucki, Wei Ping, Bryan Catanzaro
This work introduces a predominantly parallel, end-to-end TTS model based on normalizing flows.
1 code implementation • 9 Apr 2021 • Deepak Narayanan, Mohammad Shoeybi, Jared Casper, Patrick Legresley, Mostofa Patwary, Vijay Anand Korthikanti, Dmitri Vainbrand, Prethvi Kashinkunti, Julie Bernauer, Bryan Catanzaro, Amar Phanishayee, Matei Zaharia
In this paper, we show how different types of parallelism methods (tensor, pipeline, and data parallelism) can be composed to scale to thousands of GPUs and models with trillions of parameters.
1 code implementation • ICCV 2021 • Ning Yu, Guilin Liu, Aysegul Dundar, Andrew Tao, Bryan Catanzaro, Larry Davis, Mario Fritz
Lastly, we study different attention architectures in the discriminator, and propose a reference attention mechanism.
2 code implementations • ACL 2021 • Devendra Singh Sachan, Mostofa Patwary, Mohammad Shoeybi, Neel Kant, Wei Ping, William L Hamilton, Bryan Catanzaro
We also explore two approaches for end-to-end supervised training of the reader and retriever components in OpenQA models.
1 code implementation • NeurIPS 2020 • Vitaly Kurin, Saad Godil, Shimon Whiteson, Bryan Catanzaro
While more work is needed to apply Graph-Q-SAT to reduce wall clock time in modern SAT solving settings, it is a compelling proof-of-concept showing that RL equipped with Graph Neural Networks can learn a generalizable branching heuristic for SAT search.
no code implementations • NeurIPS 2020 • Morteza Mardani, Guilin Liu, Aysegul Dundar, Shiqiu Liu, Andrew Tao, Bryan Catanzaro
The conventional CNNs, recently adopted for synthesis, require to train and test on the same set of images and fail to generalize to unseen images.
1 code implementation • 20 Oct 2020 • Sashank Santhanam, Wei Ping, Raul Puri, Mohammad Shoeybi, Mostofa Patwary, Bryan Catanzaro
State-of-the-art conversational agents have advanced significantly in conjunction with the use of large transformer-based language models.
no code implementations • EMNLP 2020 • Peng Xu, Mostofa Patwary, Mohammad Shoeybi, Raul Puri, Pascale Fung, Anima Anandkumar, Bryan Catanzaro
We showcase the controllability of our model by replacing the keywords used to generate stories and re-running the generation process.
11 code implementations • ICLR 2021 • Zhifeng Kong, Wei Ping, Jiaji Huang, Kexin Zhao, Bryan Catanzaro
In this work, we propose DiffWave, a versatile diffusion probabilistic model for conditional and unconditional waveform generation.
no code implementations • 14 Jul 2020 • Guilin Liu, Rohan Taori, Ting-Chun Wang, Zhiding Yu, Shiqiu Liu, Fitsum A. Reda, Karan Sapra, Andrew Tao, Bryan Catanzaro
Specifically, we directly treat the whole encoded feature map of the input texture as transposed convolution filters and the features' self-similarity map, which captures the auto-correlation information, as input to the transposed convolution.
7 code implementations • 21 May 2020 • Andrew Tao, Karan Sapra, Bryan Catanzaro
Multi-scale inference is commonly used to improve the results of semantic segmentation.
Ranked #4 on
Semantic Segmentation
on Cityscapes val
(using extra training data)
no code implementations • ACL 2020 • Alex Boyd, Raul Puri, Mohammad Shoeybi, Mostofa Patwary, Bryan Catanzaro
This work introduces the Generative Conversation Control model, an augmented and fine-tuned GPT-2 language model that conditions on past reference conversations to probabilistically model multi-turn conversations in the actor's persona.
3 code implementations • ICLR 2021 • Rafael Valle, Kevin Shih, Ryan Prenger, Bryan Catanzaro
In this paper we propose Flowtron: an autoregressive flow-based generative network for text-to-speech synthesis with control over speech variation and style transfer.
Ranked #1 on
Text-To-Speech Synthesis
on LJSpeech
(Pleasantness MOS metric, using extra
training data)
no code implementations • CVPR 2020 • Aysegul Dundar, Karan Sapra, Guilin Liu, Andrew Tao, Bryan Catanzaro
Conditional image synthesis for generating photorealistic images serves various applications for content editing to content generation.
no code implementations • 13 Mar 2020 • Nikolai Yakovenko, Avantika Lal, Johnny Israeli, Bryan Catanzaro
Variant calling, the problem of estimating whether a position in a DNA sequence differs from a reference sequence, given noisy, redundant, overlapping short sequences that cover that position, is fundamental to genomics.
no code implementations • EMNLP 2020 • Raul Puri, Ryan Spring, Mostofa Patwary, Mohammad Shoeybi, Bryan Catanzaro
On the SQuAD1. 1 question answering task, we achieve higher accuracy using solely synthetic questions and answers than when using the SQuAD1. 1 training set questions alone.
1 code implementation • 26 Jan 2020 • Aysegul Dundar, Kevin J. Shih, Animesh Garg, Robert Pottorf, Andrew Tao, Bryan Catanzaro
However, the reconstruction task of the entire image forces the model to allocate landmarks to model the background.
no code implementations • 25 Dec 2019 • Rafael Valle, Fitsum Reda, Mohammad Shoeybi, Patrick Legresley, Andrew Tao, Bryan Catanzaro
We propose a novel approach for image segmentation that combines Neural Ordinary Differential Equations (NODEs) and the Level Set method.
no code implementations • 10 Dec 2019 • Raul Puri, Bryan Catanzaro
This work investigates the use of natural language to enable zero-shot model adaptation to new tasks.
6 code implementations • NeurIPS 2019 • Ting-Chun Wang, Ming-Yu Liu, Andrew Tao, Guilin Liu, Jan Kautz, Bryan Catanzaro
To address the limitations, we propose a few-shot vid2vid framework, which learns to synthesize videos of previously unseen subjects or scenes by leveraging few example images of the target at test time.
Ranked #1 on
Video-to-Video Synthesis
on YouTube Dancing
4 code implementations • 26 Oct 2019 • Rafael Valle, Jason Li, Ryan Prenger, Bryan Catanzaro
Mellotron is a multispeaker voice synthesis model based on Tacotron 2 GST that can make a voice emote and sing without emotive or singing training data.
2 code implementations • 26 Sep 2019 • Vitaly Kurin, Saad Godil, Shimon Whiteson, Bryan Catanzaro
While more work is needed to apply Graph-$Q$-SAT to reduce wall clock time in modern SAT solving settings, it is a compelling proof-of-concept showing that RL equipped with Graph Neural Networks can learn a generalizable branching heuristic for SAT search.
no code implementations • 25 Sep 2019 • Vitaly Kurin, Saad Godil, Shimon Whiteson, Bryan Catanzaro
We present GQSAT, a branching heuristic in a Boolean SAT solver trained with value-based reinforcement learning (RL) using Graph Neural Networks for function approximation.
11 code implementations • 17 Sep 2019 • Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick Legresley, Jared Casper, Bryan Catanzaro
To demonstrate that large language models can further advance the state of the art (SOTA), we train an 8. 3 billion parameter transformer language model similar to GPT-2 and a 3. 9 billion parameter model similar to BERT.
Ranked #2 on
Reading Comprehension
on RACE
no code implementations • 6 Sep 2019 • Kevin J. Shih, Aysegul Dundar, Animesh Garg, Robert Pottorf, Andrew Tao, Bryan Catanzaro
Prediction and interpolation for long-range video data involves the complex task of modeling motion trajectories for each visible object, occlusions and dis-occlusions, as well as appearance changes due to viewpoint and lighting.
1 code implementation • ICCV 2019 • Fitsum A. Reda, Deqing Sun, Aysegul Dundar, Mohammad Shoeybi, Guilin Liu, Kevin J. Shih, Andrew Tao, Jan Kautz, Bryan Catanzaro
We further introduce a pseudo supervised loss term that enforces the interpolated frames to be consistent with predictions of a pre-trained interpolation model.
Ranked #1 on
Video Frame Interpolation
on UCF101
(PSNR (sRGB) metric)
no code implementations • 29 Mar 2019 • Alexander Ratner, Dan Alistarh, Gustavo Alonso, David G. Andersen, Peter Bailis, Sarah Bird, Nicholas Carlini, Bryan Catanzaro, Jennifer Chayes, Eric Chung, Bill Dally, Jeff Dean, Inderjit S. Dhillon, Alexandros Dimakis, Pradeep Dubey, Charles Elkan, Grigori Fursin, Gregory R. Ganger, Lise Getoor, Phillip B. Gibbons, Garth A. Gibson, Joseph E. Gonzalez, Justin Gottschlich, Song Han, Kim Hazelwood, Furong Huang, Martin Jaggi, Kevin Jamieson, Michael. I. Jordan, Gauri Joshi, Rania Khalaf, Jason Knight, Jakub Konečný, Tim Kraska, Arun Kumar, Anastasios Kyrillidis, Aparna Lakshmiratan, Jing Li, Samuel Madden, H. Brendan McMahan, Erik Meijer, Ioannis Mitliagkas, Rajat Monga, Derek Murray, Kunle Olukotun, Dimitris Papailiopoulos, Gennady Pekhimenko, Theodoros Rekatsinas, Afshin Rostamizadeh, Christopher Ré, Christopher De Sa, Hanie Sedghi, Siddhartha Sen, Virginia Smith, Alex Smola, Dawn Song, Evan Sparks, Ion Stoica, Vivienne Sze, Madeleine Udell, Joaquin Vanschoren, Shivaram Venkataraman, Rashmi Vinayak, Markus Weimer, Andrew Gordon Wilson, Eric Xing, Matei Zaharia, Ce Zhang, Ameet Talwalkar
Machine learning (ML) techniques are enjoying rapidly increasing adoption.
3 code implementations • CVPR 2019 • Ji Zhang, Kevin J. Shih, Ahmed Elgammal, Andrew Tao, Bryan Catanzaro
The first, Entity Instance Confusion, occurs when the model confuses multiple instances of the same type of entity (e. g. multiple cups).
1 code implementation • 4 Dec 2018 • Neel Kant, Raul Puri, Nikolai Yakovenko, Bryan Catanzaro
Multi-emotion sentiment classification is a natural language processing (NLP) problem with valuable use cases on real-world data.
Ranked #3 on
Emotion Classification
on SemEval 2018 Task 1E-c
(Macro-F1 metric)
5 code implementations • CVPR 2019 • Yi Zhu, Karan Sapra, Fitsum A. Reda, Kevin J. Shih, Shawn Newsam, Andrew Tao, Bryan Catanzaro
In this paper, we present a video prediction-based methodology to scale up training sets by synthesizing new training samples in order to improve the accuracy of semantic segmentation networks.
Ranked #2 on
Semantic Segmentation
on KITTI Semantic Segmentation
(using extra training data)
4 code implementations • 28 Nov 2018 • Guilin Liu, Kevin J. Shih, Ting-Chun Wang, Fitsum A. Reda, Karan Sapra, Zhiding Yu, Andrew Tao, Bryan Catanzaro
In this paper, we present a simple yet effective padding scheme that can be used as a drop-in module for existing convolutional neural networks.
no code implementations • 21 Nov 2018 • Ji Zhang, Kevin Shih, Andrew Tao, Bryan Catanzaro, Ahmed Elgammal
We propose an efficient and interpretable scene graph generator.
1 code implementation • 2 Nov 2018 • Fitsum A. Reda, Guilin Liu, Kevin J. Shih, Robert Kirby, Jon Barker, David Tarjan, Andrew Tao, Bryan Catanzaro
We present an approach for high-resolution video frame prediction by conditioning on both past frames and past optical flows.
no code implementations • 1 Nov 2018 • Ji Zhang, Kevin Shih, Andrew Tao, Bryan Catanzaro, Ahmed Elgammal
This article describes the model we built that achieved 1st place in the OpenImage Visual Relationship Detection Challenge on Kaggle.
2 code implementations • 31 Oct 2018 • Ryan Prenger, Rafael Valle, Bryan Catanzaro
In this paper we propose WaveGlow: a flow-based network capable of generating high quality speech from mel-spectrograms.
Ranked #7 on
Speech Synthesis
on LibriTTS
1 code implementation • ECCV 2018 • Fitsum A. Reda, Guilin Liu, Kevin J. Shih, Robert Kirby, Jon Barker, David Tarjan, Andrew Tao, Bryan Catanzaro
We present an approach for high-resolution video frame prediction by conditioning on both past frames and past optical flows.
Ranked #1 on
Video Prediction
on YouTube-8M
11 code implementations • NeurIPS 2018 • Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Guilin Liu, Andrew Tao, Jan Kautz, Bryan Catanzaro
We study the problem of video-to-video synthesis, whose goal is to learn a mapping function from an input source video (e. g., a sequence of semantic segmentation masks) to an output photorealistic video that precisely depicts the content of the source video.
1 code implementation • 3 Aug 2018 • Raul Puri, Robert Kirby, Nikolai Yakovenko, Bryan Catanzaro
We provide a learning rate schedule that allows our model to converge with a 32k batch size.
58 code implementations • ECCV 2018 • Guilin Liu, Fitsum A. Reda, Kevin J. Shih, Ting-Chun Wang, Andrew Tao, Bryan Catanzaro
Existing deep learning based image inpainting methods use a standard convolutional network over the corrupted image, using convolutional filter responses conditioned on both valid pixels as well as the substitute values in the masked holes (typically the mean value).
18 code implementations • CVPR 2018 • Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, Bryan Catanzaro
We present a new method for synthesizing high-resolution photo-realistic images from semantic label maps using conditional generative adversarial networks (conditional GANs).
Ranked #2 on
Sketch-to-Image Translation
on COCO-Stuff
Conditional Image Generation
Fundus to Angiography Generation
+4
7 code implementations • 25 Oct 2017 • Edward Raff, Jon Barker, Jared Sylvester, Robert Brandon, Bryan Catanzaro, Charles Nicholas
In this work we introduce malware detection from raw byte sequences as a fruitful research area to the larger machine learning community.
2 code implementations • 15 Jul 2016 • Song Han, Jeff Pool, Sharan Narang, Huizi Mao, Enhao Gong, Shijian Tang, Erich Elsen, Peter Vajda, Manohar Paluri, John Tran, Bryan Catanzaro, William J. Dally
We propose DSD, a dense-sparse-dense training flow, for regularizing deep neural networks and achieving better optimization performance.
35 code implementations • 8 Dec 2015 • Dario Amodei, Rishita Anubhai, Eric Battenberg, Carl Case, Jared Casper, Bryan Catanzaro, Jingdong Chen, Mike Chrzanowski, Adam Coates, Greg Diamos, Erich Elsen, Jesse Engel, Linxi Fan, Christopher Fougner, Tony Han, Awni Hannun, Billy Jun, Patrick LeGresley, Libby Lin, Sharan Narang, Andrew Ng, Sherjil Ozair, Ryan Prenger, Jonathan Raiman, Sanjeev Satheesh, David Seetapun, Shubho Sengupta, Yi Wang, Zhiqian Wang, Chong Wang, Bo Xiao, Dani Yogatama, Jun Zhan, Zhenyao Zhu
We show that an end-to-end deep learning approach can be used to recognize either English or Mandarin Chinese speech--two vastly different languages.
24 code implementations • 17 Dec 2014 • Awni Hannun, Carl Case, Jared Casper, Bryan Catanzaro, Greg Diamos, Erich Elsen, Ryan Prenger, Sanjeev Satheesh, Shubho Sengupta, Adam Coates, Andrew Y. Ng
We present a state-of-the-art speech recognition system developed using end-to-end deep learning.
3 code implementations • 3 Oct 2014 • Sharan Chetlur, Cliff Woolley, Philippe Vandermersch, Jonathan Cohen, John Tran, Bryan Catanzaro, Evan Shelhamer
To address this problem, we have created a library similar in intent to BLAS, with optimized routines for deep learning workloads.
2 code implementations • 18 Nov 2009 • Andreas Klöckner, Nicolas Pinto, Yunsup Lee, Bryan Catanzaro, Paul Ivanov, Ahmed Fasih
In introducing PyCUDA and PyOpenCL, this article proposes the combination of a dynamic, high-level scripting language with the massive performance of a GPU as a compelling two-tiered computing platform, potentially offering significant performance and productivity advantages over conventional single-tier, static systems.
Distributed, Parallel, and Cluster Computing Software Engineering D.1.2