Search Results for author: Bryan Catanzaro

Found 62 papers, 35 papers with code

PrefixRL: Optimization of Parallel Prefix Circuits using Deep Reinforcement Learning

no code implementations14 May 2022 Rajarshi Roy, Jonathan Raiman, Neel Kant, Ilyas Elkin, Robert Kirby, Michael Siu, Stuart Oberman, Saad Godil, Bryan Catanzaro

Deep Convolutional RL agents trained on this environment produce prefix adder circuits that Pareto-dominate existing baselines with up to 16. 0% and 30. 2% lower area for the same delay in the 32b and 64b settings respectively.


Reducing Activation Recomputation in Large Transformer Models

no code implementations10 May 2022 Vijay Korthikanti, Jared Casper, Sangkug Lym, Lawrence McAfee, Michael Andersch, Mohammad Shoeybi, Bryan Catanzaro

In this paper, we show how to significantly accelerate training of large transformer models by reducing activation recomputation.

Fine Detailed Texture Learning for 3D Meshes with Generative Models

no code implementations17 Mar 2022 Aysegul Dundar, Jun Gao, Andrew Tao, Bryan Catanzaro

The reconstruction is posed as an adaptation problem and is done progressively where in the first stage, we focus on learning accurate geometry, whereas in the second stage, we focus on learning the texture with a generative adversarial network.

Generative Modeling for Low Dimensional Speech Attributes with Neural Spline Flows

no code implementations3 Mar 2022 Kevin J. Shih, Rafael Valle, Rohan Badlani, João Felipe Santos, Bryan Catanzaro

Despite recent advances in generative modeling for text-to-speech synthesis, these models do not yet have the same fine-grained adjustability of pitch-conditioned deterministic models such as FastPitch and FastSpeech2.

Speech Synthesis Text-To-Speech Synthesis

Leveraging Bitstream Metadata for Fast and Accurate Video Compression Correction

no code implementations31 Jan 2022 Max Ehrlich, Jon Barker, Namitha Padmanabhan, Larry Davis, Andrew Tao, Bryan Catanzaro, Abhinav Shrivastava

Video compression is a central feature of the modern internet powering technologies from social media to video conferencing.

Video Compression

Few-shot Instruction Prompts for Pretrained Language Models to Detect Social Biases

no code implementations15 Dec 2021 Shrimai Prabhumoye, Rafal Kocielnik, Mohammad Shoeybi, Anima Anandkumar, Bryan Catanzaro

We then provide the LM with instruction that consists of this subset of labeled exemplars, the query text to be classified, a definition of bias, and prompt it to make a decision.

Pretrained Language Models

Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers

1 code implementation24 Nov 2021 John Guibas, Morteza Mardani, Zongyi Li, Andrew Tao, Anima Anandkumar, Bryan Catanzaro

AFNO is based on a principled foundation of operator learning which allows us to frame token mixing as a continuous global convolution without any dependence on the input resolution.

Operator learning Representation Learning

Efficient Token Mixing for Transformers via Adaptive Fourier Neural Operators

no code implementations ICLR 2022 John Guibas, Morteza Mardani, Zongyi Li, Andrew Tao, Anima Anandkumar, Bryan Catanzaro

AFNO is based on a principled foundation of operator learning which allows us to frame token mixing as a continuous global convolution without any dependence on the input resolution.

Operator learning Representation Learning

Guiding Global Placement With Reinforcement Learning

no code implementations6 Sep 2021 Robert Kirby, Kolby Nottingham, Rajarshi Roy, Saad Godil, Bryan Catanzaro

In this work we augment state-of-the-art, force-based global placement solvers with a reinforcement learning agent trained to improve the final detail placed Half Perimeter Wire Length (HPWL).


One TTS Alignment To Rule Them All

3 code implementations23 Aug 2021 Rohan Badlani, Adrian Łancucki, Kevin J. Shih, Rafael Valle, Wei Ping, Bryan Catanzaro

However, these alignments tend to be brittle and often fail to generalize to long utterances and out-of-domain text, leading to missing or repeating words.

Speech Synthesis

Long-Short Transformer: Efficient Transformers for Language and Vision

3 code implementations NeurIPS 2021 Chen Zhu, Wei Ping, Chaowei Xiao, Mohammad Shoeybi, Tom Goldstein, Anima Anandkumar, Bryan Catanzaro

For instance, Transformer-LS achieves 0. 97 test BPC on enwik8 using half the number of parameters than previous method, while being faster and is able to handle 3x as long sequences compared to its full-attention version on the same hardware.

Language Modelling

View Generalization for Single Image Textured 3D Models

no code implementations CVPR 2021 Anand Bhattad, Aysegul Dundar, Guilin Liu, Andrew Tao, Bryan Catanzaro

We describe a cycle consistency loss that encourages model textures to be aligned, so as to encourage sharing.

Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM

1 code implementation9 Apr 2021 Deepak Narayanan, Mohammad Shoeybi, Jared Casper, Patrick Legresley, Mostofa Patwary, Vijay Anand Korthikanti, Dmitri Vainbrand, Prethvi Kashinkunti, Julie Bernauer, Bryan Catanzaro, Amar Phanishayee, Matei Zaharia

In this paper, we show how different types of parallelism methods (tensor, pipeline, and data parallelism) can be composed to scale to thousands of GPUs and models with trillions of parameters.

Language Modelling

Can Q-Learning with Graph Networks Learn a Generalizable Branching Heuristic for a SAT Solver?

1 code implementation NeurIPS 2020 Vitaly Kurin, Saad Godil, Shimon Whiteson, Bryan Catanzaro

While more work is needed to apply Graph-Q-SAT to reduce wall clock time in modern SAT solving settings, it is a compelling proof-of-concept showing that RL equipped with Graph Neural Networks can learn a generalizable branching heuristic for SAT search.

Feature Engineering Q-Learning

Neural FFTs for Universal Texture Image Synthesis

no code implementations NeurIPS 2020 Morteza Mardani, Guilin Liu, Aysegul Dundar, Shiqiu Liu, Andrew Tao, Bryan Catanzaro

The conventional CNNs, recently adopted for synthesis, require to train and test on the same set of images and fail to generalize to unseen images.

Image Generation Texture Synthesis

Local Knowledge Powered Conversational Agents

1 code implementation20 Oct 2020 Sashank Santhanam, Wei Ping, Raul Puri, Mohammad Shoeybi, Mostofa Patwary, Bryan Catanzaro

State-of-the-art conversational agents have advanced significantly in conjunction with the use of large transformer-based language models.


DiffWave: A Versatile Diffusion Model for Audio Synthesis

8 code implementations ICLR 2021 Zhifeng Kong, Wei Ping, Jiaji Huang, Kexin Zhao, Bryan Catanzaro

In this work, we propose DiffWave, a versatile diffusion probabilistic model for conditional and unconditional waveform generation.

Speech Synthesis

Transposer: Universal Texture Synthesis Using Feature Maps as Transposed Convolution Filter

no code implementations14 Jul 2020 Guilin Liu, Rohan Taori, Ting-Chun Wang, Zhiding Yu, Shiqiu Liu, Fitsum A. Reda, Karan Sapra, Andrew Tao, Bryan Catanzaro

Specifically, we directly treat the whole encoded feature map of the input texture as transposed convolution filters and the features' self-similarity map, which captures the auto-correlation information, as input to the transposed convolution.

Texture Synthesis

Hierarchical Multi-Scale Attention for Semantic Segmentation

7 code implementations21 May 2020 Andrew Tao, Karan Sapra, Bryan Catanzaro

Multi-scale inference is commonly used to improve the results of semantic segmentation.

Ranked #2 on Semantic Segmentation on Cityscapes val (using extra training data)

Panoptic Segmentation

Large Scale Multi-Actor Generative Dialog Modeling

no code implementations ACL 2020 Alex Boyd, Raul Puri, Mohammad Shoeybi, Mostofa Patwary, Bryan Catanzaro

This work introduces the Generative Conversation Control model, an augmented and fine-tuned GPT-2 language model that conditions on past reference conversations to probabilistically model multi-turn conversations in the actor's persona.

Goal-Oriented Dialog Language Modelling

Flowtron: an Autoregressive Flow-based Generative Network for Text-to-Speech Synthesis

2 code implementations ICLR 2021 Rafael Valle, Kevin Shih, Ryan Prenger, Bryan Catanzaro

In this paper we propose Flowtron: an autoregressive flow-based generative network for text-to-speech synthesis with control over speech variation and style transfer.

 Ranked #1 on Text-To-Speech Synthesis on LJSpeech (Pleasantness MOS metric)

Speech Synthesis Style Transfer +1

Panoptic-based Image Synthesis

no code implementations CVPR 2020 Aysegul Dundar, Karan Sapra, Guilin Liu, Andrew Tao, Bryan Catanzaro

Conditional image synthesis for generating photorealistic images serves various applications for content editing to content generation.

Image Generation

Genome Variant Calling with a Deep Averaging Network

no code implementations13 Mar 2020 Nikolai Yakovenko, Avantika Lal, Johnny Israeli, Bryan Catanzaro

Variant calling, the problem of estimating whether a position in a DNA sequence differs from a reference sequence, given noisy, redundant, overlapping short sequences that cover that position, is fundamental to genomics.

Training Question Answering Models From Synthetic Data

no code implementations EMNLP 2020 Raul Puri, Ryan Spring, Mostofa Patwary, Mohammad Shoeybi, Bryan Catanzaro

On the SQuAD1. 1 question answering task, we achieve higher accuracy using solely synthetic questions and answers than when using the SQuAD1. 1 training set questions alone.

Answer Generation Data Augmentation +1

Neural ODEs for Image Segmentation with Level Sets

no code implementations25 Dec 2019 Rafael Valle, Fitsum Reda, Mohammad Shoeybi, Patrick Legresley, Andrew Tao, Bryan Catanzaro

We propose a novel approach for image segmentation that combines Neural Ordinary Differential Equations (NODEs) and the Level Set method.

RGB Salient Object Detection Salient Object Detection +1

Zero-shot Text Classification With Generative Language Models

no code implementations10 Dec 2019 Raul Puri, Bryan Catanzaro

This work investigates the use of natural language to enable zero-shot model adaptation to new tasks.

Classification General Classification +2

Few-shot Video-to-Video Synthesis

6 code implementations NeurIPS 2019 Ting-Chun Wang, Ming-Yu Liu, Andrew Tao, Guilin Liu, Jan Kautz, Bryan Catanzaro

To address the limitations, we propose a few-shot vid2vid framework, which learns to synthesize videos of previously unseen subjects or scenes by leveraging few example images of the target at test time.

Video-to-Video Synthesis

Mellotron: Multispeaker expressive voice synthesis by conditioning on rhythm, pitch and global style tokens

4 code implementations26 Oct 2019 Rafael Valle, Jason Li, Ryan Prenger, Bryan Catanzaro

Mellotron is a multispeaker voice synthesis model based on Tacotron 2 GST that can make a voice emote and sing without emotive or singing training data.

Frame Style Transfer

Can $Q$-Learning with Graph Networks Learn a Generalizable Branching Heuristic for a SAT Solver?

2 code implementations26 Sep 2019 Vitaly Kurin, Saad Godil, Shimon Whiteson, Bryan Catanzaro

While more work is needed to apply Graph-$Q$-SAT to reduce wall clock time in modern SAT solving settings, it is a compelling proof-of-concept showing that RL equipped with Graph Neural Networks can learn a generalizable branching heuristic for SAT search.

Feature Engineering Q-Learning

Improving SAT Solver Heuristics with Graph Networks and Reinforcement Learning

no code implementations25 Sep 2019 Vitaly Kurin, Saad Godil, Shimon Whiteson, Bryan Catanzaro

We present GQSAT, a branching heuristic in a Boolean SAT solver trained with value-based reinforcement learning (RL) using Graph Neural Networks for function approximation.

Feature Engineering reinforcement-learning

Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism

8 code implementations17 Sep 2019 Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick Legresley, Jared Casper, Bryan Catanzaro

To demonstrate that large language models can further advance the state of the art (SOTA), we train an 8. 3 billion parameter transformer language model similar to GPT-2 and a 3. 9 billion parameter model similar to BERT.

 Ranked #1 on Language Modelling on WikiText-103 (using extra training data)

Language Modelling Reading Comprehension

Video Interpolation and Prediction with Unsupervised Landmarks

no code implementations6 Sep 2019 Kevin J. Shih, Aysegul Dundar, Animesh Garg, Robert Pottorf, Andrew Tao, Bryan Catanzaro

Prediction and interpolation for long-range video data involves the complex task of modeling motion trajectories for each visible object, occlusions and dis-occlusions, as well as appearance changes due to viewpoint and lighting.

MOTION INTERPOLATION Optical Flow Estimation +1

Unsupervised Video Interpolation Using Cycle Consistency

1 code implementation ICCV 2019 Fitsum A. Reda, Deqing Sun, Aysegul Dundar, Mohammad Shoeybi, Guilin Liu, Kevin J. Shih, Andrew Tao, Jan Kautz, Bryan Catanzaro

We further introduce a pseudo supervised loss term that enforces the interpolated frames to be consistent with predictions of a pre-trained interpolation model.

 Ranked #1 on Video Frame Interpolation on UCF101 (PSNR (sRGB) metric)

Frame Video Frame Interpolation

Graphical Contrastive Losses for Scene Graph Parsing

3 code implementations CVPR 2019 Ji Zhang, Kevin J. Shih, Ahmed Elgammal, Andrew Tao, Bryan Catanzaro

The first, Entity Instance Confusion, occurs when the model confuses multiple instances of the same type of entity (e. g. multiple cups).

Scene Graph Generation Visual Relationship Detection

Improving Semantic Segmentation via Video Propagation and Label Relaxation

3 code implementations CVPR 2019 Yi Zhu, Karan Sapra, Fitsum A. Reda, Kevin J. Shih, Shawn Newsam, Andrew Tao, Bryan Catanzaro

In this paper, we present a video prediction-based methodology to scale up training sets by synthesizing new training samples in order to improve the accuracy of semantic segmentation networks.

 Ranked #1 on Semantic Segmentation on CamVid (using extra training data)

Semantic Segmentation Video Propagation

Practical Text Classification With Large Pre-Trained Language Models

1 code implementation4 Dec 2018 Neel Kant, Raul Puri, Nikolai Yakovenko, Bryan Catanzaro

Multi-emotion sentiment classification is a natural language processing (NLP) problem with valuable use cases on real-world data.

Classification Emotion Classification +3

Partial Convolution based Padding

4 code implementations28 Nov 2018 Guilin Liu, Kevin J. Shih, Ting-Chun Wang, Fitsum A. Reda, Karan Sapra, Zhiding Yu, Andrew Tao, Bryan Catanzaro

In this paper, we present a simple yet effective padding scheme that can be used as a drop-in module for existing convolutional neural networks.

General Classification Semantic Segmentation

SDCNet: Video Prediction Using Spatially-Displaced Convolution

1 code implementation2 Nov 2018 Fitsum A. Reda, Guilin Liu, Kevin J. Shih, Robert Kirby, Jon Barker, David Tarjan, Andrew Tao, Bryan Catanzaro

We present an approach for high-resolution video frame prediction by conditioning on both past frames and past optical flows.

Frame Optical Flow Estimation +2

Introduction to the 1st Place Winning Model of OpenImages Relationship Detection Challenge

no code implementations1 Nov 2018 Ji Zhang, Kevin Shih, Andrew Tao, Bryan Catanzaro, Ahmed Elgammal

This article describes the model we built that achieved 1st place in the OpenImage Visual Relationship Detection Challenge on Kaggle.

Visual Relationship Detection

WaveGlow: A Flow-based Generative Network for Speech Synthesis

3 code implementations31 Oct 2018 Ryan Prenger, Rafael Valle, Bryan Catanzaro

In this paper we propose WaveGlow: a flow-based network capable of generating high quality speech from mel-spectrograms.

Speech Synthesis

Video-to-Video Synthesis

11 code implementations NeurIPS 2018 Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Guilin Liu, Andrew Tao, Jan Kautz, Bryan Catanzaro

We study the problem of video-to-video synthesis, whose goal is to learn a mapping function from an input source video (e. g., a sequence of semantic segmentation masks) to an output photorealistic video that precisely depicts the content of the source video.

Semantic Segmentation Video Prediction +1

Large Scale Language Modeling: Converging on 40GB of Text in Four Hours

1 code implementation3 Aug 2018 Raul Puri, Robert Kirby, Nikolai Yakovenko, Bryan Catanzaro

We provide a learning rate schedule that allows our model to converge with a 32k batch size.

Language Modelling

Image Inpainting for Irregular Holes Using Partial Convolutions

53 code implementations ECCV 2018 Guilin Liu, Fitsum A. Reda, Kevin J. Shih, Ting-Chun Wang, Andrew Tao, Bryan Catanzaro

Existing deep learning based image inpainting methods use a standard convolutional network over the corrupted image, using convolutional filter responses conditioned on both valid pixels as well as the substitute values in the masked holes (typically the mean value).

Image Inpainting

Malware Detection by Eating a Whole EXE

6 code implementations25 Oct 2017 Edward Raff, Jon Barker, Jared Sylvester, Robert Brandon, Bryan Catanzaro, Charles Nicholas

In this work we introduce malware detection from raw byte sequences as a fruitful research area to the larger machine learning community.

Malware Detection

cuDNN: Efficient Primitives for Deep Learning

3 code implementations3 Oct 2014 Sharan Chetlur, Cliff Woolley, Philippe Vandermersch, Jonathan Cohen, John Tran, Bryan Catanzaro, Evan Shelhamer

To address this problem, we have created a library similar in intent to BLAS, with optimized routines for deep learning workloads.

PyCUDA and PyOpenCL: A Scripting-Based Approach to GPU Run-Time Code Generation

2 code implementations18 Nov 2009 Andreas Klöckner, Nicolas Pinto, Yunsup Lee, Bryan Catanzaro, Paul Ivanov, Ahmed Fasih

In introducing PyCUDA and PyOpenCL, this article proposes the combination of a dynamic, high-level scripting language with the massive performance of a GPU as a compelling two-tiered computing platform, potentially offering significant performance and productivity advantages over conventional single-tier, static systems.

Distributed, Parallel, and Cluster Computing Software Engineering D.1.2

Cannot find the paper you are looking for? You can Submit a new open access paper.