Search Results for author: Andrew Tao

Found 29 papers, 16 papers with code

VILA: On Pre-training for Visual Language Models

2 code implementations • 12 Dec 2023 • Ji Lin, Hongxu Yin, Wei Ping, Yao Lu, Pavlo Molchanov, Andrew Tao, Huizi Mao, Jan Kautz, Mohammad Shoeybi, Song Han

Visual language models (VLMs) rapidly progressed with the recent success of large language models.

Ranked #21 on Visual Question Answering on MM-Vet

In-Context Learning Language Modelling +2

1,782

Paper
Code

FasterViT: Fast Vision Transformers with Hierarchical Attention

2 code implementations • 9 Jun 2023 • Ali Hatamizadeh, Greg Heinrich, Hongxu Yin, Andrew Tao, Jose M. Alvarez, Jan Kautz, Pavlo Molchanov

At a high level, global self-attentions enable the efficient cross-window communication at lower costs.

object-detection Object Detection +1

668

Paper
Code

Progressive Learning of 3D Reconstruction Network from 2D GAN Data

no code implementations • 18 May 2023 • Aysegul Dundar, Jun Gao, Andrew Tao, Bryan Catanzaro

In this work, to overcome these limitations of generated datasets, we have two main contributions which lead us to achieve state-of-the-art results on challenging objects: 1) A robust multi-stage learning scheme that gradually relies more on the models own predictions when calculating losses, 2) A novel adversarial learning pipeline with online pseudo-ground truth generations to achieve fine details.

3D Reconstruction

Paper
Add Code

Preserve Your Own Correlation: A Noise Prior for Video Diffusion Models

no code implementations • ICCV 2023 • Songwei Ge, Seungjun Nah, Guilin Liu, Tyler Poon, Andrew Tao, Bryan Catanzaro, David Jacobs, Jia-Bin Huang, Ming-Yu Liu, Yogesh Balaji

Despite tremendous progress in generating high-quality images using diffusion models, synthesizing a sequence of animated frames that are both photorealistic and temporally coherent is still in its infancy.

Ranked #8 on Text-to-Video Generation on UCF-101

Image Generation Text-to-Video Generation +1

Paper
Add Code

Fine Detailed Texture Learning for 3D Meshes with Generative Models

no code implementations • 17 Mar 2022 • Aysegul Dundar, Jun Gao, Andrew Tao, Bryan Catanzaro

The reconstruction is posed as an adaptation problem and is done progressively where in the first stage, we focus on learning accurate geometry, whereas in the second stage, we focus on learning the texture with a generative adversarial network.

Generative Adversarial Network

Paper
Add Code

Leveraging Bitstream Metadata for Fast, Accurate, Generalized Compressed Video Quality Enhancement

no code implementations • 31 Jan 2022 • Max Ehrlich, Jon Barker, Namitha Padmanabhan, Larry Davis, Andrew Tao, Bryan Catanzaro, Abhinav Shrivastava

Video compression is a central feature of the modern internet powering technologies from social media to video conferencing.

Quantization Video Compression

Paper
Add Code

Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers

2 code implementations • 24 Nov 2021 • John Guibas, Morteza Mardani, Zongyi Li, Andrew Tao, Anima Anandkumar, Bryan Catanzaro

AFNO is based on a principled foundation of operator learning which allows us to frame token mixing as a continuous global convolution without any dependence on the input resolution.

Computational Efficiency Operator learning +1

185

Paper
Code

Efficient Token Mixing for Transformers via Adaptive Fourier Neural Operators

no code implementations • ICLR 2022 • John Guibas, Morteza Mardani, Zongyi Li, Andrew Tao, Anima Anandkumar, Bryan Catanzaro

AFNO is based on a principled foundation of operator learning which allows us to frame token mixing as a continuous global convolution without any dependence on the input resolution.

Computational Efficiency Operator learning +1

Paper
Add Code

View Generalization for Single Image Textured 3D Models

no code implementations • CVPR 2021 • Anand Bhattad, Aysegul Dundar, Guilin Liu, Andrew Tao, Bryan Catanzaro

We describe a cycle consistency loss that encourages model textures to be aligned, so as to encourage sharing.

Paper
Add Code

Dual Contrastive Loss and Attention for GANs

1 code implementation • ICCV 2021 • Ning Yu, Guilin Liu, Aysegul Dundar, Andrew Tao, Bryan Catanzaro, Larry Davis, Mario Fritz

Lastly, we study different attention architectures in the discriminator, and propose a reference attention mechanism.

Image Generation Unconditional Image Generation

Paper
Code

Neural FFTs for Universal Texture Image Synthesis

no code implementations • NeurIPS 2020 • Morteza Mardani, Guilin Liu, Aysegul Dundar, Shiqiu Liu, Andrew Tao, Bryan Catanzaro

The conventional CNNs, recently adopted for synthesis, require to train and test on the same set of images and fail to generalize to unseen images.

Image Generation Texture Synthesis

Paper
Add Code

Transposer: Universal Texture Synthesis Using Feature Maps as Transposed Convolution Filter

no code implementations • 14 Jul 2020 • Guilin Liu, Rohan Taori, Ting-Chun Wang, Zhiding Yu, Shiqiu Liu, Fitsum A. Reda, Karan Sapra, Andrew Tao, Bryan Catanzaro

Specifically, we directly treat the whole encoded feature map of the input texture as transposed convolution filters and the features' self-similarity map, which captures the auto-correlation information, as input to the transposed convolution.

Texture Synthesis

Paper
Add Code

Hierarchical Multi-Scale Attention for Semantic Segmentation

8 code implementations • 21 May 2020 • Andrew Tao, Karan Sapra, Bryan Catanzaro

Multi-scale inference is commonly used to improve the results of semantic segmentation.

Ranked #6 on Semantic Segmentation on Cityscapes val (using extra training data)

Panoptic Segmentation

8,248

Paper
Code

Panoptic-based Image Synthesis

no code implementations • CVPR 2020 • Aysegul Dundar, Karan Sapra, Guilin Liu, Andrew Tao, Bryan Catanzaro

Conditional image synthesis for generating photorealistic images serves various applications for content editing to content generation.

Image Generation

Paper
Add Code

Unsupervised Disentanglement of Pose, Appearance and Background from Images and Videos

1 code implementation • 26 Jan 2020 • Aysegul Dundar, Kevin J. Shih, Animesh Garg, Robert Pottorf, Andrew Tao, Bryan Catanzaro

However, the reconstruction task of the entire image forces the model to allocate landmarks to model the background.

Disentanglement Video Prediction

Paper
Code

Neural ODEs for Image Segmentation with Level Sets

no code implementations • 25 Dec 2019 • Rafael Valle, Fitsum Reda, Mohammad Shoeybi, Patrick Legresley, Andrew Tao, Bryan Catanzaro

We propose a novel approach for image segmentation that combines Neural Ordinary Differential Equations (NODEs) and the Level Set method.

Image Segmentation object-detection +4

Paper
Add Code

Few-shot Video-to-Video Synthesis

6 code implementations • NeurIPS 2019 • Ting-Chun Wang, Ming-Yu Liu, Andrew Tao, Guilin Liu, Jan Kautz, Bryan Catanzaro

To address the limitations, we propose a few-shot vid2vid framework, which learns to synthesize videos of previously unseen subjects or scenes by leveraging few example images of the target at test time.

Ranked #1 on Video-to-Video Synthesis on YouTube Dancing

Video-to-Video Synthesis

1,781

Paper
Code

Video Interpolation and Prediction with Unsupervised Landmarks

no code implementations • 6 Sep 2019 • Kevin J. Shih, Aysegul Dundar, Animesh Garg, Robert Pottorf, Andrew Tao, Bryan Catanzaro

Prediction and interpolation for long-range video data involves the complex task of modeling motion trajectories for each visible object, occlusions and dis-occlusions, as well as appearance changes due to viewpoint and lighting.

Motion Interpolation Optical Flow Estimation +1

Paper
Add Code

Unsupervised Video Interpolation Using Cycle Consistency

1 code implementation • ICCV 2019 • Fitsum A. Reda, Deqing Sun, Aysegul Dundar, Mohammad Shoeybi, Guilin Liu, Kevin J. Shih, Andrew Tao, Jan Kautz, Bryan Catanzaro

We further introduce a pseudo supervised loss term that enforces the interpolated frames to be consistent with predictions of a pre-trained interpolation model.

Ranked #1 on Video Frame Interpolation on UCF101 (PSNR (sRGB) metric)

Video Frame Interpolation

107

Paper
Code

Graphical Contrastive Losses for Scene Graph Parsing

3 code implementations • CVPR 2019 • Ji Zhang, Kevin J. Shih, Ahmed Elgammal, Andrew Tao, Bryan Catanzaro

The first, Entity Instance Confusion, occurs when the model confuses multiple instances of the same type of entity (e. g. multiple cups).

Relationship Detection Scene Graph Generation +1

12,992

Paper
Code

Improving Semantic Segmentation via Video Propagation and Label Relaxation

5 code implementations • CVPR 2019 • Yi Zhu, Karan Sapra, Fitsum A. Reda, Kevin J. Shih, Shawn Newsam, Andrew Tao, Bryan Catanzaro

In this paper, we present a video prediction-based methodology to scale up training sets by synthesizing new training samples in order to improve the accuracy of semantic segmentation networks.

Ranked #2 on Semantic Segmentation on KITTI Semantic Segmentation (using extra training data)

Segmentation Semantic Segmentation +1

1,751

Paper
Code

Partial Convolution based Padding

4 code implementations • 28 Nov 2018 • Guilin Liu, Kevin J. Shih, Ting-Chun Wang, Fitsum A. Reda, Karan Sapra, Zhiding Yu, Andrew Tao, Bryan Catanzaro

In this paper, we present a simple yet effective padding scheme that can be used as a drop-in module for existing convolutional neural networks.

General Classification Semantic Segmentation

1,198

Paper
Code

An Interpretable Model for Scene Graph Generation

no code implementations • 21 Nov 2018 • Ji Zhang, Kevin Shih, Andrew Tao, Bryan Catanzaro, Ahmed Elgammal

We propose an efficient and interpretable scene graph generator.

Graph Generation Image Captioning +3

Paper
Add Code

SDCNet: Video Prediction Using Spatially-Displaced Convolution

1 code implementation • 2 Nov 2018 • Fitsum A. Reda, Guilin Liu, Kevin J. Shih, Robert Kirby, Jon Barker, David Tarjan, Andrew Tao, Bryan Catanzaro

We present an approach for high-resolution video frame prediction by conditioning on both past frames and past optical flows.

Optical Flow Estimation SSIM +1

1,751

Paper
Code

Introduction to the 1st Place Winning Model of OpenImages Relationship Detection Challenge

no code implementations • 1 Nov 2018 • Ji Zhang, Kevin Shih, Andrew Tao, Bryan Catanzaro, Ahmed Elgammal

This article describes the model we built that achieved 1st place in the OpenImage Visual Relationship Detection Challenge on Kaggle.

Relationship Detection Visual Relationship Detection

Paper
Add Code

SDC-Net: Video prediction using spatially-displaced convolution

1 code implementation • ECCV 2018 • Fitsum A. Reda, Guilin Liu, Kevin J. Shih, Robert Kirby, Jon Barker, David Tarjan, Andrew Tao, Bryan Catanzaro

We present an approach for high-resolution video frame prediction by conditioning on both past frames and past optical flows.

Ranked #1 on Video Prediction on YouTube-8M

Optical Flow Estimation SSIM +1

1,751

Paper
Code

Video-to-Video Synthesis

11 code implementations • NeurIPS 2018 • Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Guilin Liu, Andrew Tao, Jan Kautz, Bryan Catanzaro

We study the problem of video-to-video synthesis, whose goal is to learn a mapping function from an input source video (e. g., a sequence of semantic segmentation masks) to an output photorealistic video that precisely depicts the content of the source video.

2k Semantic Segmentation +2

8,495

Paper
Code

Image Inpainting for Irregular Holes Using Partial Convolutions

60 code implementations • ECCV 2018 • Guilin Liu, Fitsum A. Reda, Kevin J. Shih, Ting-Chun Wang, Andrew Tao, Bryan Catanzaro

Existing deep learning based image inpainting methods use a standard convolutional network over the corrupted image, using convolutional filter responses conditioned on both valid pixels as well as the substitute values in the masked holes (typically the mean value).

Image Inpainting valid

1,198

Paper
Code

High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs

20 code implementations • CVPR 2018 • Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, Bryan Catanzaro

We present a new method for synthesizing high-resolution photo-realistic images from semantic label maps using conditional generative adversarial networks (conditional GANs).

Ranked #2 on Sketch-to-Image Translation on COCO-Stuff

Conditional Image Generation Fundus to Angiography Generation +5

6,521

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.