Search Results for author: Kevin J. Shih

Found 21 papers, 14 papers with code

P-Flow: A Fast and Data-Efficient Zero-Shot TTS through Speech Prompting

1 code implementation • NeurIPS 2023 • Sungwon Kim ~Sungwon_Kim2, Kevin J. Shih, Rohan Badlani, Joao Felipe Santos, Evelina Bakhturina, Mikyas T. Desta, Rafael Valle, Sungroh Yoon, Bryan Catanzaro

P-Flow comprises a speech-prompted text encoder for speaker adaptation and a flow matching generative decoder for high-quality and fast speech synthesis.

Speech Synthesis

162

Paper
Code

VANI: Very-lightweight Accent-controllable TTS for Native and Non-native speakers with Identity Preservation

no code implementations • 14 Mar 2023 • Rohan Badlani, Akshit Arora, Subhankar Ghosh, Rafael Valle, Kevin J. Shih, João Felipe Santos, Boris Ginsburg, Bryan Catanzaro

We introduce VANI, a very lightweight multi-lingual accent controllable speech synthesis system.

Disentanglement Speech Synthesis

Paper
Add Code

Multilingual Multiaccented Multispeaker TTS with RADTTS

no code implementations • 24 Jan 2023 • Rohan Badlani, Rafael Valle, Kevin J. Shih, João Felipe Santos, Siddharth Gururani, Bryan Catanzaro

We work to create a multilingual speech synthesis system which can generate speech with the proper accent while retaining the characteristics of an individual voice.

Speech Synthesis

Paper
Add Code

Collecting The Puzzle Pieces: Disentangled Self-Driven Human Pose Transfer by Permuting Textures

1 code implementation • ICCV 2023 • Nannan Li, Kevin J. Shih, Bryan A. Plummer

Then we reconstruct the input image by sampling from the permuted textures for patch-level disentanglement.

Disentanglement Pose Transfer +1

Paper
Code

Generative Modeling for Low Dimensional Speech Attributes with Neural Spline Flows

1 code implementation • 3 Mar 2022 • Kevin J. Shih, Rafael Valle, Rohan Badlani, João Felipe Santos, Bryan Catanzaro

Despite recent advances in generative modeling for text-to-speech synthesis, these models do not yet have the same fine-grained adjustability of pitch-conditioned deterministic models such as FastPitch and FastSpeech2.

Speech Synthesis Text-To-Speech Synthesis

271

Paper
Code

One TTS Alignment To Rule Them All

3 code implementations • 23 Aug 2021 • Rohan Badlani, Adrian Łancucki, Kevin J. Shih, Rafael Valle, Wei Ping, Bryan Catanzaro

However, these alignments tend to be brittle and often fail to generalize to long utterances and out-of-domain text, leading to missing or repeating words.

Speech Synthesis

28,936

Paper
Code

RAD-TTS: Parallel Flow-Based TTS with Robust Alignment Learning and Diverse Synthesis

1 code implementation • ICML Workshop INNF 2021 • Kevin J. Shih, Rafael Valle, Rohan Badlani, Adrian Lancucki, Wei Ping, Bryan Catanzaro

This work introduces a predominantly parallel, end-to-end TTS model based on normalizing flows.

Speech Synthesis Text-To-Speech Synthesis

271

Paper
Code

Unsupervised Disentanglement of Pose, Appearance and Background from Images and Videos

1 code implementation • 26 Jan 2020 • Aysegul Dundar, Kevin J. Shih, Animesh Garg, Robert Pottorf, Andrew Tao, Bryan Catanzaro

However, the reconstruction task of the entire image forces the model to allocate landmarks to model the background.

Disentanglement Video Prediction

Paper
Code

Video Interpolation and Prediction with Unsupervised Landmarks

no code implementations • 6 Sep 2019 • Kevin J. Shih, Aysegul Dundar, Animesh Garg, Robert Pottorf, Andrew Tao, Bryan Catanzaro

Prediction and interpolation for long-range video data involves the complex task of modeling motion trajectories for each visible object, occlusions and dis-occlusions, as well as appearance changes due to viewpoint and lighting.

Motion Interpolation Optical Flow Estimation +1

Paper
Add Code

Unsupervised Video Interpolation Using Cycle Consistency

1 code implementation • ICCV 2019 • Fitsum A. Reda, Deqing Sun, Aysegul Dundar, Mohammad Shoeybi, Guilin Liu, Kevin J. Shih, Andrew Tao, Jan Kautz, Bryan Catanzaro

We further introduce a pseudo supervised loss term that enforces the interpolated frames to be consistent with predictions of a pre-trained interpolation model.

Ranked #1 on Video Frame Interpolation on UCF101 (PSNR (sRGB) metric)

Video Frame Interpolation

108

Paper
Code

Graphical Contrastive Losses for Scene Graph Parsing

3 code implementations • CVPR 2019 • Ji Zhang, Kevin J. Shih, Ahmed Elgammal, Andrew Tao, Bryan Catanzaro

The first, Entity Instance Confusion, occurs when the model confuses multiple instances of the same type of entity (e. g. multiple cups).

Relationship Detection Scene Graph Generation +1

12,974

Paper
Code

Improving Semantic Segmentation via Video Propagation and Label Relaxation

5 code implementations • CVPR 2019 • Yi Zhu, Karan Sapra, Fitsum A. Reda, Kevin J. Shih, Shawn Newsam, Andrew Tao, Bryan Catanzaro

In this paper, we present a video prediction-based methodology to scale up training sets by synthesizing new training samples in order to improve the accuracy of semantic segmentation networks.

Ranked #2 on Semantic Segmentation on KITTI Semantic Segmentation (using extra training data)

Segmentation Semantic Segmentation +1

1,750

Paper
Code

Partial Convolution based Padding

4 code implementations • 28 Nov 2018 • Guilin Liu, Kevin J. Shih, Ting-Chun Wang, Fitsum A. Reda, Karan Sapra, Zhiding Yu, Andrew Tao, Bryan Catanzaro

In this paper, we present a simple yet effective padding scheme that can be used as a drop-in module for existing convolutional neural networks.

General Classification Semantic Segmentation

1,198

Paper
Code

Revisiting Image-Language Networks for Open-ended Phrase Detection

3 code implementations • 17 Nov 2018 • Bryan A. Plummer, Kevin J. Shih, Yichen Li, Ke Xu, Svetlana Lazebnik, Stan Sclaroff, Kate Saenko

Most existing work that grounds natural language phrases in images starts with the assumption that the phrase in question is relevant to the image.

object-detection Object Detection +1

Paper
Code

SDCNet: Video Prediction Using Spatially-Displaced Convolution

1 code implementation • 2 Nov 2018 • Fitsum A. Reda, Guilin Liu, Kevin J. Shih, Robert Kirby, Jon Barker, David Tarjan, Andrew Tao, Bryan Catanzaro

We present an approach for high-resolution video frame prediction by conditioning on both past frames and past optical flows.

Optical Flow Estimation SSIM +1

1,750

Paper
Code

SDC-Net: Video prediction using spatially-displaced convolution

1 code implementation • ECCV 2018 • Fitsum A. Reda, Guilin Liu, Kevin J. Shih, Robert Kirby, Jon Barker, David Tarjan, Andrew Tao, Bryan Catanzaro

We present an approach for high-resolution video frame prediction by conditioning on both past frames and past optical flows.

Ranked #1 on Video Prediction on YouTube-8M

Optical Flow Estimation SSIM +1

1,750

Paper
Code

Image Inpainting for Irregular Holes Using Partial Convolutions

60 code implementations • ECCV 2018 • Guilin Liu, Fitsum A. Reda, Kevin J. Shih, Ting-Chun Wang, Andrew Tao, Bryan Catanzaro

Existing deep learning based image inpainting methods use a standard convolutional network over the corrupted image, using convolutional filter responses conditioned on both valid pixels as well as the substitute values in the masked holes (typically the mean value).

Image Inpainting valid

1,198

Paper
Code

Learning Interpretable Spatial Operations in a Rich 3D Blocks World

no code implementations • 10 Dec 2017 • Yonatan Bisk, Kevin J. Shih, Yejin Choi, Daniel Marcu

In this paper, we study the problem of mapping natural language instructions to complex spatial actions in a 3D blocks world.

Paper
Add Code

Where To Look: Focus Regions for Visual Question Answering

no code implementations • CVPR 2016 • Kevin J. Shih, Saurabh Singh, Derek Hoiem

We present a method that learns to answer visual questions by selecting image regions relevant to the text-based query.

Question Answering Visual Question Answering

Paper
Add Code

Part Localization using Multi-Proposal Consensus for Fine-Grained Categorization

no code implementations • 22 Jul 2015 • Kevin J. Shih, Arun Mallya, Saurabh Singh, Derek Hoiem

We present a simple deep learning framework to simultaneously predict keypoint locations and their respective visibilities and use those to achieve state-of-the-art performance for fine-grained classification.

General Classification

Paper
Add Code

Learning Collections of Part Models for Object Recognition

no code implementations • CVPR 2013 • Ian Endres, Kevin J. Shih, Johnston Jiaa, Derek Hoiem

We propose a method to learn a diverse collection of discriminative parts from object bounding box annotations.

Object Object Recognition

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.