1 code implementation • 7 Jan 2025 • Nvidia, :, Niket Agarwal, Arslan Ali, Maciej Bala, Yogesh Balaji, Erik Barker, Tiffany Cai, Prithvijit Chattopadhyay, Yongxin Chen, Yin Cui, Yifan Ding, Daniel Dworakowski, Jiaojiao Fan, Michele Fenzi, Francesco Ferroni, Sanja Fidler, Dieter Fox, Songwei Ge, Yunhao Ge, Jinwei Gu, Siddharth Gururani, Ethan He, Jiahui Huang, Jacob Huffman, Pooya Jannaty, Jingyi Jin, Seung Wook Kim, Gergely Klár, Grace Lam, Shiyi Lan, Laura Leal-Taixe, Anqi Li, Zhaoshuo Li, Chen-Hsuan Lin, Tsung-Yi Lin, Huan Ling, Ming-Yu Liu, Xian Liu, Alice Luo, Qianli Ma, Hanzi Mao, Kaichun Mo, Arsalan Mousavian, Seungjun Nah, Sriharsha Niverty, David Page, Despoina Paschalidou, Zeeshan Patel, Lindsey Pavao, Morteza Ramezanali, Fitsum Reda, Xiaowei Ren, Vasanth Rao Naik Sabavat, Ed Schmerling, Stella Shi, Bartosz Stefaniak, Shitao Tang, Lyne Tchapmi, Przemek Tredak, Wei-Cheng Tseng, Jibin Varghese, Hao Wang, Haoxiang Wang, Heng Wang, Ting-Chun Wang, Fangyin Wei, Xinyue Wei, Jay Zhangjie Wu, Jiashu Xu, Wei Yang, Lin Yen-Chen, Xiaohui Zeng, Yu Zeng, Jing Zhang, Qinsheng Zhang, Yuxuan Zhang, Qingqing Zhao, Artur Zolkowski
We position a world foundation model as a general-purpose world model that can be fine-tuned into customized world models for downstream applications.
no code implementations • 11 Nov 2024 • Nvidia, :, Yuval Atzmon, Maciej Bala, Yogesh Balaji, Tiffany Cai, Yin Cui, Jiaojiao Fan, Yunhao Ge, Siddharth Gururani, Jacob Huffman, Ronald Isaac, Pooya Jannaty, Tero Karras, Grace Lam, J. P. Lewis, Aaron Licata, Yen-Chen Lin, Ming-Yu Liu, Qianli Ma, Arun Mallya, Ashlee Martino-Tarr, Doug Mendez, Seungjun Nah, Chris Pruett, Fitsum Reda, Jiaming Song, Ting-Chun Wang, Fangyin Wei, Xiaohui Zeng, Yu Zeng, Qinsheng Zhang
We introduce Edify Image, a family of diffusion models capable of generating photorealistic image content with pixel-perfect accuracy.
no code implementations • CVPR 2024 • Yu Zeng, Vishal M. Patel, Haochen Wang, Xun Huang, Ting-Chun Wang, Ming-Yu Liu, Yogesh Balaji
Personalized text-to-image generation models enable users to create images that depict their individual possessions in diverse scenes, finding applications in various domains.
no code implementations • 12 Apr 2023 • Johanna Karras, Aleksander Holynski, Ting-Chun Wang, Ira Kemelmacher-Shlizerman
We fine-tune on a collection of fashion videos from the UBC Fashion dataset.
no code implementations • ICCV 2023 • Johanna Karras, Aleksander Holynski, Ting-Chun Wang, Ira Kemelmacher-Shlizerman
We fine-tune on a collection of fashion videos from the UBC Fashion dataset.
no code implementations • ICCV 2023 • Siddharth Gururani, Arun Mallya, Ting-Chun Wang, Rafael Valle, Ming-Yu Liu
It uses a multi-stage approach, combining the controllability of facial landmarks with the high-quality synthesis power of a pretrained face generator.
no code implementations • 4 Oct 2022 • Arun Mallya, Ting-Chun Wang, Ming-Yu Liu
We present a new implicit warping framework for image animation using sets of source images through the transfer of the motion of a driving video.
no code implementations • 21 Sep 2022 • Yu-Ying Yeh, Koki Nagano, Sameh Khamis, Jan Kautz, Ming-Yu Liu, Ting-Chun Wang
An effective approach is to supervise the training of deep neural networks with a high-fidelity dataset of desired input-output pairs, captured with a light stage.
1 code implementation • 7 Jun 2022 • Tim Brooks, Janne Hellsten, Miika Aittala, Ting-Chun Wang, Timo Aila, Jaakko Lehtinen, Ming-Yu Liu, Alexei A. Efros, Tero Karras
Existing video generation methods often fail to produce new content as a function of time while maintaining consistencies expected in real environments, such as plausible dynamics and object persistence.
no code implementations • 27 Mar 2022 • Ting-Chun Wang, Shang-Yu Su, Yun-Nung Chen
CRS is a complex problem that consists of two main tasks: (1) recommendation and (2) response generation.
no code implementations • 9 Dec 2021 • Xun Huang, Arun Mallya, Ting-Chun Wang, Ming-Yu Liu
Existing conditional image synthesis frameworks generate images based on user inputs in a single modality, such as text, segmentation, sketch, or style reference.
2 code implementations • CVPR 2021 • Ting-Chun Wang, Arun Mallya, Ming-Yu Liu
We propose a neural talking-head video synthesis model and demonstrate its application to video conferencing.
no code implementations • 6 Aug 2020 • Ming-Yu Liu, Xun Huang, Jiahui Yu, Ting-Chun Wang, Arun Mallya
The generative adversarial network (GAN) framework has emerged as a powerful tool for various image and video synthesis tasks, allowing the synthesis of visual content in an unconditional or input-conditional manner.
no code implementations • ECCV 2020 • Arun Mallya, Ting-Chun Wang, Karan Sapra, Ming-Yu Liu
This is because they lack knowledge of the 3D world being rendered and generate each frame only based on the past few frames.
no code implementations • 14 Jul 2020 • Guilin Liu, Rohan Taori, Ting-Chun Wang, Zhiding Yu, Shiqiu Liu, Fitsum A. Reda, Karan Sapra, Andrew Tao, Bryan Catanzaro
Specifically, we directly treat the whole encoded feature map of the input texture as transposed convolution filters and the features' self-similarity map, which captures the auto-correlation information, as input to the transposed convolution.
2 code implementations • NeurIPS 2019 • Hsin-Ying Lee, Xiaodong Yang, Ming-Yu Liu, Ting-Chun Wang, Yu-Ding Lu, Ming-Hsuan Yang, Jan Kautz
In the analysis phase, we decompose a dance into a series of basic dance units, through which the model learns how to move.
Ranked #3 on Motion Synthesis on BRACE
6 code implementations • NeurIPS 2019 • Ting-Chun Wang, Ming-Yu Liu, Andrew Tao, Guilin Liu, Jan Kautz, Bryan Catanzaro
To address the limitations, we propose a few-shot vid2vid framework, which learns to synthesize videos of previously unseen subjects or scenes by leveraging few example images of the target at test time.
Ranked #1 on Video-to-Video Synthesis on YouTube Dancing
26 code implementations • CVPR 2019 • Taesung Park, Ming-Yu Liu, Ting-Chun Wang, Jun-Yan Zhu
Previous methods directly feed the semantic layout as input to the deep network, which is then processed through stacks of convolution, normalization, and nonlinearity layers.
Ranked #3 on Sketch-to-Image Translation on COCO-Stuff
4 code implementations • 28 Nov 2018 • Guilin Liu, Kevin J. Shih, Ting-Chun Wang, Fitsum A. Reda, Karan Sapra, Zhiding Yu, Andrew Tao, Bryan Catanzaro
In this paper, we present a simple yet effective padding scheme that can be used as a drop-in module for existing convolutional neural networks.
10 code implementations • NeurIPS 2018 • Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Guilin Liu, Andrew Tao, Jan Kautz, Bryan Catanzaro
We study the problem of video-to-video synthesis, whose goal is to learn a mapping function from an input source video (e. g., a sequence of semantic segmentation masks) to an output photorealistic video that precisely depicts the content of the source video.
Ranked #4 on Video deraining on Video Waterdrop Removal Dataset
no code implementations • 24 Jul 2018 • Aysegul Dundar, Ming-Yu Liu, Ting-Chun Wang, John Zedlewski, Jan Kautz
Deep neural networks have largely failed to effectively utilize synthetic data when applied to real images due to the covariate shift problem.
60 code implementations • ECCV 2018 • Guilin Liu, Fitsum A. Reda, Kevin J. Shih, Ting-Chun Wang, Andrew Tao, Bryan Catanzaro
Existing deep learning based image inpainting methods use a standard convolutional network over the corrupted image, using convolutional filter responses conditioned on both valid pixels as well as the substitute values in the masked holes (typically the mean value).
21 code implementations • CVPR 2018 • Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, Bryan Catanzaro
We present a new method for synthesizing high-resolution photo-realistic images from semantic label maps using conditional generative adversarial networks (conditional GANs).
Ranked #2 on Sketch-to-Image Translation on COCO-Stuff
Conditional Image Generation Fundus to Angiography Generation +5
1 code implementation • 8 May 2017 • Ting-Chun Wang, Jun-Yan Zhu, Nima Khademi Kalantari, Alexei A. Efros, Ravi Ramamoorthi
Given a 3 fps light field sequence and a standard 30 fps 2D video, our system can then generate a full light field video at 30 fps.
no code implementations • 9 Sep 2016 • Nima Khademi Kalantari, Ting-Chun Wang, Ravi Ramamoorthi
Specifically, we propose a novel learning-based approach to synthesize new views from a sparse set of input views.
no code implementations • 24 Aug 2016 • Ting-Chun Wang, Jun-Yan Zhu, Ebi Hiroaki, Manmohan Chandraker, Alexei A. Efros, Ravi Ramamoorthi
We introduce a new light-field dataset of materials, and take advantage of the recent success of deep learning to perform material recognition on the 4D light-field.
no code implementations • CVPR 2016 • Ting-Chun Wang, Manmohan Chandraker, Alexei A. Efros, Ravi Ramamoorthi
Light-field cameras have recently emerged as a powerful tool for one-shot passive 3D shape capture.
no code implementations • CVPR 2016 • Ting-Chun Wang, Manohar Srikanth, Ravi Ramamoorthi
In this work, we propose a multi-camera system where we combine a main high-quality camera with two low-res auxiliary cameras.
no code implementations • ICCV 2015 • Ting-Chun Wang, Alexei A. Efros, Ravi Ramamoorthi
In this paper, we develop a depth estimation algorithm that treats occlusion explicitly; the method also enables identification of occlusion edges, which may be useful in other applications.