no code implementations • 8 Jan 2025 • Nannan Li, Kevin J. Shih, Bryan A. Plummer
We introduce a garment extraction model that generates (human, synthetic garment) pairs from a single image of a clothed individual.
1 code implementation • NeurIPS 2023 • Sungwon Kim ~Sungwon_Kim2, Kevin J. Shih, Rohan Badlani, Joao Felipe Santos, Evelina Bakhturina, Mikyas T. Desta, Rafael Valle, Sungroh Yoon, Bryan Catanzaro
P-Flow comprises a speech-prompted text encoder for speaker adaptation and a flow matching generative decoder for high-quality and fast speech synthesis.
no code implementations • 14 Mar 2023 • Rohan Badlani, Akshit Arora, Subhankar Ghosh, Rafael Valle, Kevin J. Shih, João Felipe Santos, Boris Ginsburg, Bryan Catanzaro
We introduce VANI, a very lightweight multi-lingual accent controllable speech synthesis system.
no code implementations • 24 Jan 2023 • Rohan Badlani, Rafael Valle, Kevin J. Shih, João Felipe Santos, Siddharth Gururani, Bryan Catanzaro
We work to create a multilingual speech synthesis system which can generate speech with the proper accent while retaining the characteristics of an individual voice.
1 code implementation • ICCV 2023 • Nannan Li, Kevin J. Shih, Bryan A. Plummer
Then we reconstruct the input image by sampling from the permuted textures for patch-level disentanglement.
1 code implementation • 3 Mar 2022 • Kevin J. Shih, Rafael Valle, Rohan Badlani, João Felipe Santos, Bryan Catanzaro
Despite recent advances in generative modeling for text-to-speech synthesis, these models do not yet have the same fine-grained adjustability of pitch-conditioned deterministic models such as FastPitch and FastSpeech2.
3 code implementations • 23 Aug 2021 • Rohan Badlani, Adrian Łancucki, Kevin J. Shih, Rafael Valle, Wei Ping, Bryan Catanzaro
However, these alignments tend to be brittle and often fail to generalize to long utterances and out-of-domain text, leading to missing or repeating words.
1 code implementation • ICML Workshop INNF 2021 • Kevin J. Shih, Rafael Valle, Rohan Badlani, Adrian Lancucki, Wei Ping, Bryan Catanzaro
This work introduces a predominantly parallel, end-to-end TTS model based on normalizing flows.
1 code implementation • 26 Jan 2020 • Aysegul Dundar, Kevin J. Shih, Animesh Garg, Robert Pottorf, Andrew Tao, Bryan Catanzaro
However, the reconstruction task of the entire image forces the model to allocate landmarks to model the background.
no code implementations • 6 Sep 2019 • Kevin J. Shih, Aysegul Dundar, Animesh Garg, Robert Pottorf, Andrew Tao, Bryan Catanzaro
Prediction and interpolation for long-range video data involves the complex task of modeling motion trajectories for each visible object, occlusions and dis-occlusions, as well as appearance changes due to viewpoint and lighting.
1 code implementation • ICCV 2019 • Fitsum A. Reda, Deqing Sun, Aysegul Dundar, Mohammad Shoeybi, Guilin Liu, Kevin J. Shih, Andrew Tao, Jan Kautz, Bryan Catanzaro
We further introduce a pseudo supervised loss term that enforces the interpolated frames to be consistent with predictions of a pre-trained interpolation model.
Ranked #1 on Video Frame Interpolation on UCF101 (PSNR (sRGB) metric)
3 code implementations • CVPR 2019 • Ji Zhang, Kevin J. Shih, Ahmed Elgammal, Andrew Tao, Bryan Catanzaro
The first, Entity Instance Confusion, occurs when the model confuses multiple instances of the same type of entity (e. g. multiple cups).
5 code implementations • CVPR 2019 • Yi Zhu, Karan Sapra, Fitsum A. Reda, Kevin J. Shih, Shawn Newsam, Andrew Tao, Bryan Catanzaro
In this paper, we present a video prediction-based methodology to scale up training sets by synthesizing new training samples in order to improve the accuracy of semantic segmentation networks.
Ranked #2 on Semantic Segmentation on KITTI Semantic Segmentation (using extra training data)
4 code implementations • 28 Nov 2018 • Guilin Liu, Kevin J. Shih, Ting-Chun Wang, Fitsum A. Reda, Karan Sapra, Zhiding Yu, Andrew Tao, Bryan Catanzaro
In this paper, we present a simple yet effective padding scheme that can be used as a drop-in module for existing convolutional neural networks.
3 code implementations • 17 Nov 2018 • Bryan A. Plummer, Kevin J. Shih, Yichen Li, Ke Xu, Svetlana Lazebnik, Stan Sclaroff, Kate Saenko
Most existing work that grounds natural language phrases in images starts with the assumption that the phrase in question is relevant to the image.
2 code implementations • 2 Nov 2018 • Fitsum A. Reda, Guilin Liu, Kevin J. Shih, Robert Kirby, Jon Barker, David Tarjan, Andrew Tao, Bryan Catanzaro
We present an approach for high-resolution video frame prediction by conditioning on both past frames and past optical flows.
1 code implementation • ECCV 2018 • Fitsum A. Reda, Guilin Liu, Kevin J. Shih, Robert Kirby, Jon Barker, David Tarjan, Andrew Tao, Bryan Catanzaro
We present an approach for high-resolution video frame prediction by conditioning on both past frames and past optical flows.
Ranked #1 on Video Prediction on YouTube-8M
60 code implementations • ECCV 2018 • Guilin Liu, Fitsum A. Reda, Kevin J. Shih, Ting-Chun Wang, Andrew Tao, Bryan Catanzaro
Existing deep learning based image inpainting methods use a standard convolutional network over the corrupted image, using convolutional filter responses conditioned on both valid pixels as well as the substitute values in the masked holes (typically the mean value).
no code implementations • 10 Dec 2017 • Yonatan Bisk, Kevin J. Shih, Yejin Choi, Daniel Marcu
In this paper, we study the problem of mapping natural language instructions to complex spatial actions in a 3D blocks world.
no code implementations • CVPR 2016 • Kevin J. Shih, Saurabh Singh, Derek Hoiem
We present a method that learns to answer visual questions by selecting image regions relevant to the text-based query.
no code implementations • 22 Jul 2015 • Kevin J. Shih, Arun Mallya, Saurabh Singh, Derek Hoiem
We present a simple deep learning framework to simultaneously predict keypoint locations and their respective visibilities and use those to achieve state-of-the-art performance for fine-grained classification.
no code implementations • CVPR 2013 • Ian Endres, Kevin J. Shih, Johnston Jiaa, Derek Hoiem
We propose a method to learn a diverse collection of discriminative parts from object bounding box annotations.