no code implementations • 7 Oct 2024 • Jae Shin Yoon, Zhixin Shu, Mengwei Ren, Xuaner Zhang, Yannick Hold-Geoffroy, Krishna Kumar Singh, He Zhang
For robust and natural shadow removal, we propose to train the diffusion model with a compositional repurposing framework: a pre-trained text-guided image generation model is first fine-tuned to harmonize the lighting and color of the foreground with a background scene by using a background harmonization dataset; and then the model is further fine-tuned to generate a shadow-free portrait image via a shadow-paired dataset.
no code implementations • 1 Oct 2024 • Yuheng Li, Haotian Liu, Mu Cai, Yijun Li, Eli Shechtman, Zhe Lin, Yong Jae Lee, Krishna Kumar Singh
In this paper, we introduce a model designed to improve the prediction of image-text alignment, targeting the challenge of compositional understanding in current visual-language models.
1 code implementation • 22 Sep 2024 • Yuming Jiang, Nanxuan Zhao, Qing Liu, Krishna Kumar Singh, Shuai Yang, Chen Change Loy, Ziwei Liu
The training data engine covers the diverse needs of group portrait editing.
no code implementations • 19 Jan 2024 • Boxiao Pan, Zhan Xu, Chun-Hao Paul Huang, Krishna Kumar Singh, Yang Zhou, Leonidas J. Guibas, Jimei Yang
Generating video background that tailors to foreground subject motion is an important problem for the movie industry and visual effects community.
1 code implementation • CVPR 2024 • Nannan Li, Qing Liu, Krishna Kumar Singh, Yilin Wang, Jianming Zhang, Bryan A. Plummer, Zhe Lin
In this paper, we propose UniHuman, a unified model that addresses multiple facets of human image editing in real-world settings.
no code implementations • 10 Dec 2023 • Zhipeng Bao, Yijun Li, Krishna Kumar Singh, Yu-Xiong Wang, Martial Hebert
Despite recent significant strides achieved by diffusion-based Text-to-Image (T2I) models, current systems are still less capable of ensuring decent compositional generation aligned with text prompts, particularly for the multi-object generation.
no code implementations • 4 Jul 2023 • Zhen Zhu, Yijun Li, Weijie Lyu, Krishna Kumar Singh, Zhixin Shu, Soeren Pirk, Derek Hoiem
We investigate how to generate multimodal image outputs, such as RGB, depth, and surface normals, with a single generative model.
1 code implementation • CVPR 2023 • Sumith Kulal, Tim Brooks, Alex Aiken, Jiajun Wu, Jimei Yang, Jingwan Lu, Alexei A. Efros, Krishna Kumar Singh
Given a scene image with a marked region and an image of a person, we insert the person into the scene while respecting the scene affordances.
no code implementations • 28 Feb 2023 • Wonwoong Cho, Hareesh Ravi, Midhun Harikumar, Vinh Khuc, Krishna Kumar Singh, Jingwan Lu, David I. Inouye, Ajinkya Kale
Second, we propose timestep-dependent weight scheduling for content and style features to further improve the performance.
no code implementations • 24 Feb 2023 • Cusuh Ham, James Hays, Jingwan Lu, Krishna Kumar Singh, Zhifei Zhang, Tobias Hinz
We show that MCM enables user control over the spatial layout of the image and leads to increased control over the image generation process.
2 code implementations • 6 Feb 2023 • Gaurav Parmar, Krishna Kumar Singh, Richard Zhang, Yijun Li, Jingwan Lu, Jun-Yan Zhu
However, it is still challenging to directly apply these models for editing real images for two reasons.
Ranked #13 on Text-based Image Editing on PIE-Bench
no code implementations • CVPR 2023 • Junying Wang, Jae Shin Yoon, Tuanfeng Y. Wang, Krishna Kumar Singh, Ulrich Neumann
This paper presents a method to reconstruct a complete human geometry and texture from an image of a person with only partial body observed, e. g., a torso.
no code implementations • ICCV 2023 • Rishabh Jain, Mayur Hemani, Duygu Ceylan, Krishna Kumar Singh, Jingwan Lu, Mausoom Sarkar, Balaji Krishnamurthy
Numerous pose-guided human editing methods have been explored by the vision community due to their extensive practical applications.
no code implementations • CVPR 2023 • Rishabh Jain, Krishna Kumar Singh, Mayur Hemani, Jingwan Lu, Mausoom Sarkar, Duygu Ceylan, Balaji Krishnamurthy
The task of human reposing involves generating a realistic image of a person standing in an arbitrary conceivable pose.
no code implementations • 4 Nov 2022 • Yuheng Li, Yijun Li, Jingwan Lu, Eli Shechtman, Yong Jae Lee, Krishna Kumar Singh
We introduce a new method for diverse foreground generation with explicit control over various factors.
1 code implementation • CVPR 2022 • Gaurav Parmar, Yijun Li, Jingwan Lu, Richard Zhang, Jun-Yan Zhu, Krishna Kumar Singh
We propose a new method to invert and edit such complex images in the latent space of GANs, such as StyleGAN2.
1 code implementation • CVPR 2022 • Yang Xue, Yuheng Li, Krishna Kumar Singh, Yong Jae Lee
3D-aware generative models have shown that the introduction of 3D information can lead to more controllable image generation.
2 code implementations • CVPR 2022 • Anna Frühstück, Krishna Kumar Singh, Eli Shechtman, Niloy J. Mitra, Peter Wonka, Jingwan Lu
Instead of modeling this complex domain with a single GAN, we propose a novel method to combine multiple pretrained GANs, where one GAN generates a global canvas (e. g., human body) and a set of specialized GANs, or insets, focus on different parts (e. g., faces, shoes) that can be seamlessly inserted onto the global canvas.
no code implementations • 10 Nov 2021 • Tuanfeng Y. Wang, Duygu Ceylan, Krishna Kumar Singh, Niloy J. Mitra
Synthesizing dynamic appearances of humans in motion plays a central role in applications such as AR/VR and video editing.
no code implementations • ICCV 2021 • Yuheng Li, Yijun Li, Jingwan Lu, Eli Shechtman, Yong Jae Lee, Krishna Kumar Singh
We propose a new approach for high resolution semantic image synthesis.
no code implementations • CVPR 2021 • Pei Wang, Yijun Li, Krishna Kumar Singh, Jingwan Lu, Nuno Vasconcelos
We introduce an inversion based method, denoted as IMAge-Guided model INvErsion (IMAGINE), to generate high-quality and diverse images from only a single training sample.
no code implementations • 5 Apr 2021 • Utkarsh Ojha, Krishna Kumar Singh, Yong Jae Lee
We consider the novel task of learning disentangled representations of object shape and appearance across multiple domains (e. g., dogs and cars).
no code implementations • ICLR 2021 • Utkarsh Ojha, Krishna Kumar Singh, Yong Jae Lee
We consider the novel task of learning disentangled representations of object shape and appearance across multiple domains (e. g., dogs and cars).
1 code implementation • CVPR 2020 • Krishna Kumar Singh, Dhruv Mahajan, Kristen Grauman, Yong Jae Lee, Matt Feiszli, Deepti Ghadiyaram
Our key idea is to decorrelate feature representations of a category from its co-occurring context.
3 code implementations • CVPR 2020 • Yuheng Li, Krishna Kumar Singh, Utkarsh Ojha, Yong Jae Lee
We present MixNMatch, a conditional generative model that learns to disentangle and encode background, object pose, shape, and texture from real images with minimal supervision, for mix-and-match image generation.
1 code implementation • NeurIPS 2020 • Utkarsh Ojha, Krishna Kumar Singh, Cho-Jui Hsieh, Yong Jae Lee
We propose a novel unsupervised generative model that learns to disentangle object identity from other low-level aspects in class-imbalanced data.
no code implementations • CVPR 2019 • Krishna Kumar Singh, Yong Jae Lee
We use the W-RPN to generate high precision object proposals, which are in turn used to re-rank high recall proposals like edge boxes or selective search according to their spatial overlap.
1 code implementation • CVPR 2019 • Krishna Kumar Singh, Utkarsh Ojha, Yong Jae Lee
We propose FineGAN, a novel unsupervised GAN framework, which disentangles the background, object shape, and object appearance to hierarchically generate images of fine-grained object categories.
Ranked #1 on Image Clustering on Stanford Dogs
2 code implementations • 6 Nov 2018 • Krishna Kumar Singh, Hao Yu, Aron Sarmasi, Gautam Pradeep, Yong Jae Lee
Our approach only needs to modify the input image and can work with any network to improve its performance.
no code implementations • ECCV 2018 • Krishna Kumar Singh, Santosh Divvala, Ali Farhadi, Yong Jae Lee
We present a scalable approach for Detecting Objects by transferring Common-sense Knowledge (DOCK) from source to target categories.
no code implementations • 25 May 2017 • Wenjian Hu, Krishna Kumar Singh, Fanyi Xiao, Jinyoung Han, Chen-Nee Chuah, Yong Jae Lee
Content popularity prediction has been extensively studied due to its importance and interest for both users and hosts of social media sites like Facebook, Instagram, Twitter, and Pinterest.
no code implementations • CVPR 2017 • Chenyou Fan, Jang-Won Lee, Mingze Xu, Krishna Kumar Singh, Yong Jae Lee, David J. Crandall, Michael S. Ryoo
We consider scenarios in which we wish to perform joint scene understanding, object tracking, activity recognition, and other tasks in environments in which multiple people are wearing body-worn cameras while a third-person static camera also captures the scene.
3 code implementations • ICCV 2017 • Krishna Kumar Singh, Yong Jae Lee
We propose `Hide-and-Seek', a weakly-supervised framework that aims to improve object localization in images and action localization in videos.
Ranked #24 on Weakly Supervised Action Localization on THUMOS 2014
no code implementations • 9 Aug 2016 • Krishna Kumar Singh, Yong Jae Lee
We propose an end-to-end deep convolutional network to simultaneously localize and rank relative visual attributes, given only weakly-supervised pairwise image comparisons.
no code implementations • CVPR 2016 • Krishna Kumar Singh, Fanyi Xiao, Yong Jae Lee
The status quo approach to training object detectors requires expensive bounding box annotations.