Search Results for author: Andrew Tao

Found 33 papers, 17 papers with code

PHI-S: Distribution Balancing for Label-Free Multi-Teacher Distillation

no code implementations2 Oct 2024 Mike Ranzinger, Jon Barker, Greg Heinrich, Pavlo Molchanov, Bryan Catanzaro, Andrew Tao

Various visual foundation models have distinct strengths and weaknesses, both of which can be improved through heterogeneous multi-teacher knowledge distillation without labels, termed "agglomerative models."

Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders

1 code implementation28 Aug 2024 Min Shi, Fuxiao Liu, Shihao Wang, Shijia Liao, Subhashree Radhakrishnan, De-An Huang, Hongxu Yin, Karan Sapra, Yaser Yacoob, Humphrey Shi, Bryan Catanzaro, Andrew Tao, Jan Kautz, Zhiding Yu, Guilin Liu

We discover that simply concatenating visual tokens from a set of complementary vision encoders is as effective as more complex mixing architectures or strategies.

Optical Character Recognition

X-VILA: Cross-Modality Alignment for Large Language Model

no code implementations29 May 2024 Hanrong Ye, De-An Huang, Yao Lu, Zhiding Yu, Wei Ping, Andrew Tao, Jan Kautz, Song Han, Dan Xu, Pavlo Molchanov, Hongxu Yin

We introduce X-VILA, an omni-modality model designed to extend the capabilities of large language models (LLMs) by incorporating image, video, and audio modalities.

Instruction Following Language Modelling +1

Progressive Learning of 3D Reconstruction Network from 2D GAN Data

no code implementations18 May 2023 Aysegul Dundar, Jun Gao, Andrew Tao, Bryan Catanzaro

In this work, to overcome these limitations of generated datasets, we have two main contributions which lead us to achieve state-of-the-art results on challenging objects: 1) A robust multi-stage learning scheme that gradually relies more on the models own predictions when calculating losses, 2) A novel adversarial learning pipeline with online pseudo-ground truth generations to achieve fine details.

3D Reconstruction

Preserve Your Own Correlation: A Noise Prior for Video Diffusion Models

no code implementations ICCV 2023 Songwei Ge, Seungjun Nah, Guilin Liu, Tyler Poon, Andrew Tao, Bryan Catanzaro, David Jacobs, Jia-Bin Huang, Ming-Yu Liu, Yogesh Balaji

Despite tremendous progress in generating high-quality images using diffusion models, synthesizing a sequence of animated frames that are both photorealistic and temporally coherent is still in its infancy.

Image Generation Text-to-Video Generation +1

Fine Detailed Texture Learning for 3D Meshes with Generative Models

no code implementations17 Mar 2022 Aysegul Dundar, Jun Gao, Andrew Tao, Bryan Catanzaro

The reconstruction is posed as an adaptation problem and is done progressively where in the first stage, we focus on learning accurate geometry, whereas in the second stage, we focus on learning the texture with a generative adversarial network.

Generative Adversarial Network

Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers

3 code implementations24 Nov 2021 John Guibas, Morteza Mardani, Zongyi Li, Andrew Tao, Anima Anandkumar, Bryan Catanzaro

AFNO is based on a principled foundation of operator learning which allows us to frame token mixing as a continuous global convolution without any dependence on the input resolution.

Computational Efficiency Operator learning +1

Efficient Token Mixing for Transformers via Adaptive Fourier Neural Operators

no code implementations ICLR 2022 John Guibas, Morteza Mardani, Zongyi Li, Andrew Tao, Anima Anandkumar, Bryan Catanzaro

AFNO is based on a principled foundation of operator learning which allows us to frame token mixing as a continuous global convolution without any dependence on the input resolution.

Computational Efficiency Operator learning +1

View Generalization for Single Image Textured 3D Models

no code implementations CVPR 2021 Anand Bhattad, Aysegul Dundar, Guilin Liu, Andrew Tao, Bryan Catanzaro

We describe a cycle consistency loss that encourages model textures to be aligned, so as to encourage sharing.

3D geometry

Neural FFTs for Universal Texture Image Synthesis

no code implementations NeurIPS 2020 Morteza Mardani, Guilin Liu, Aysegul Dundar, Shiqiu Liu, Andrew Tao, Bryan Catanzaro

The conventional CNNs, recently adopted for synthesis, require to train and test on the same set of images and fail to generalize to unseen images.

Image Generation Texture Synthesis

Transposer: Universal Texture Synthesis Using Feature Maps as Transposed Convolution Filter

no code implementations14 Jul 2020 Guilin Liu, Rohan Taori, Ting-Chun Wang, Zhiding Yu, Shiqiu Liu, Fitsum A. Reda, Karan Sapra, Andrew Tao, Bryan Catanzaro

Specifically, we directly treat the whole encoded feature map of the input texture as transposed convolution filters and the features' self-similarity map, which captures the auto-correlation information, as input to the transposed convolution.

Texture Synthesis

Hierarchical Multi-Scale Attention for Semantic Segmentation

8 code implementations21 May 2020 Andrew Tao, Karan Sapra, Bryan Catanzaro

Multi-scale inference is commonly used to improve the results of semantic segmentation.

Ranked #6 on Semantic Segmentation on Cityscapes val (using extra training data)

Panoptic Segmentation

Panoptic-based Image Synthesis

no code implementations CVPR 2020 Aysegul Dundar, Karan Sapra, Guilin Liu, Andrew Tao, Bryan Catanzaro

Conditional image synthesis for generating photorealistic images serves various applications for content editing to content generation.

Image Generation

Neural ODEs for Image Segmentation with Level Sets

no code implementations25 Dec 2019 Rafael Valle, Fitsum Reda, Mohammad Shoeybi, Patrick Legresley, Andrew Tao, Bryan Catanzaro

We propose a novel approach for image segmentation that combines Neural Ordinary Differential Equations (NODEs) and the Level Set method.

Image Segmentation object-detection +4

Few-shot Video-to-Video Synthesis

6 code implementations NeurIPS 2019 Ting-Chun Wang, Ming-Yu Liu, Andrew Tao, Guilin Liu, Jan Kautz, Bryan Catanzaro

To address the limitations, we propose a few-shot vid2vid framework, which learns to synthesize videos of previously unseen subjects or scenes by leveraging few example images of the target at test time.

Video-to-Video Synthesis

Video Interpolation and Prediction with Unsupervised Landmarks

no code implementations6 Sep 2019 Kevin J. Shih, Aysegul Dundar, Animesh Garg, Robert Pottorf, Andrew Tao, Bryan Catanzaro

Prediction and interpolation for long-range video data involves the complex task of modeling motion trajectories for each visible object, occlusions and dis-occlusions, as well as appearance changes due to viewpoint and lighting.

Decoder Motion Interpolation +2

Unsupervised Video Interpolation Using Cycle Consistency

1 code implementation ICCV 2019 Fitsum A. Reda, Deqing Sun, Aysegul Dundar, Mohammad Shoeybi, Guilin Liu, Kevin J. Shih, Andrew Tao, Jan Kautz, Bryan Catanzaro

We further introduce a pseudo supervised loss term that enforces the interpolated frames to be consistent with predictions of a pre-trained interpolation model.

 Ranked #1 on Video Frame Interpolation on UCF101 (PSNR (sRGB) metric)

Triplet Video Frame Interpolation

Graphical Contrastive Losses for Scene Graph Parsing

3 code implementations CVPR 2019 Ji Zhang, Kevin J. Shih, Ahmed Elgammal, Andrew Tao, Bryan Catanzaro

The first, Entity Instance Confusion, occurs when the model confuses multiple instances of the same type of entity (e. g. multiple cups).

Relationship Detection Scene Graph Generation +1

Improving Semantic Segmentation via Video Propagation and Label Relaxation

5 code implementations CVPR 2019 Yi Zhu, Karan Sapra, Fitsum A. Reda, Kevin J. Shih, Shawn Newsam, Andrew Tao, Bryan Catanzaro

In this paper, we present a video prediction-based methodology to scale up training sets by synthesizing new training samples in order to improve the accuracy of semantic segmentation networks.

Ranked #2 on Semantic Segmentation on KITTI Semantic Segmentation (using extra training data)

Segmentation Semantic Segmentation +1

Partial Convolution based Padding

4 code implementations28 Nov 2018 Guilin Liu, Kevin J. Shih, Ting-Chun Wang, Fitsum A. Reda, Karan Sapra, Zhiding Yu, Andrew Tao, Bryan Catanzaro

In this paper, we present a simple yet effective padding scheme that can be used as a drop-in module for existing convolutional neural networks.

General Classification Semantic Segmentation

SDCNet: Video Prediction Using Spatially-Displaced Convolution

2 code implementations2 Nov 2018 Fitsum A. Reda, Guilin Liu, Kevin J. Shih, Robert Kirby, Jon Barker, David Tarjan, Andrew Tao, Bryan Catanzaro

We present an approach for high-resolution video frame prediction by conditioning on both past frames and past optical flows.

Optical Flow Estimation SSIM +1

Introduction to the 1st Place Winning Model of OpenImages Relationship Detection Challenge

no code implementations1 Nov 2018 Ji Zhang, Kevin Shih, Andrew Tao, Bryan Catanzaro, Ahmed Elgammal

This article describes the model we built that achieved 1st place in the OpenImage Visual Relationship Detection Challenge on Kaggle.

Relationship Detection Visual Relationship Detection

Video-to-Video Synthesis

11 code implementations NeurIPS 2018 Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Guilin Liu, Andrew Tao, Jan Kautz, Bryan Catanzaro

We study the problem of video-to-video synthesis, whose goal is to learn a mapping function from an input source video (e. g., a sequence of semantic segmentation masks) to an output photorealistic video that precisely depicts the content of the source video.

2k Semantic Segmentation +2

Image Inpainting for Irregular Holes Using Partial Convolutions

60 code implementations ECCV 2018 Guilin Liu, Fitsum A. Reda, Kevin J. Shih, Ting-Chun Wang, Andrew Tao, Bryan Catanzaro

Existing deep learning based image inpainting methods use a standard convolutional network over the corrupted image, using convolutional filter responses conditioned on both valid pixels as well as the substitute values in the masked holes (typically the mean value).

Image Inpainting valid

High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs

20 code implementations CVPR 2018 Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, Bryan Catanzaro

We present a new method for synthesizing high-resolution photo-realistic images from semantic label maps using conditional generative adversarial networks (conditional GANs).

Conditional Image Generation Fundus to Angiography Generation +5

Cannot find the paper you are looking for? You can Submit a new open access paper.