no code implementations • 25 Nov 2024 • Aravindan Sundaram, Ujjayan Pal, Abhimanyu Chauhan, Aishwarya Agarwal, Srikrishna Karanam
Despite recent advancements in text-to-image models, achieving semantically accurate images in text-to-image diffusion models is a persistent challenge.
no code implementations • 25 Nov 2024 • Aishwarya Agarwal, Srikrishna Karanam, Vineet Gandhi
Our next innovation is TIDE, a novel training scheme with a concept saliency alignment loss that ensures model focus on the right per-concept regions and a local concept contrastive loss that promotes learning domain-invariant concept representations.
no code implementations • 16 Nov 2024 • Tripti Shukla, Srikrishna Karanam, Balaji Vasan Srinivasan
To address this gap in the current literature, we propose our method called TINTIN: Test-time Conditional Text-to-Image Synthesis using Diffusion Models which is a new training-free test-time only algorithm to condition text-to-image diffusion model outputs on conditioning factors such as color palettes and edge maps.
no code implementations • 4 Sep 2024 • Aishwarya Agarwal, Srikrishna Karanam, Balaji Vasan Srinivasan
We consider the problem of independently, in a disentangled fashion, controlling the outputs of text-to-image diffusion models with color and style attributes of a user-supplied reference image.
no code implementations • 2 Jul 2024 • Sayan Nag, Koustava Goswami, Srikrishna Karanam
To enable principled training of models in such low-annotation settings, improve image-text region-level alignment, and further enhance spatial localization of the target object in the image, we propose Cross-modal Fusion with Attention Consistency module.
Referring Expression
Weakly Supervised Referring Expression Segmentation
no code implementations • 27 Jun 2024 • Aishwarya Agarwal, Srikrishna Karanam, Balaji Vasan Srinivasan
We propose a new post-processing algorithm, AlignIT, that infuses the keys and values for the concept of interest while ensuring the keys and values for all other tokens in the input prompt are unchanged.
no code implementations • 14 Jun 2024 • Harsh Rangwani, Aishwarya Agarwal, Kuldeep Kulkarni, R. Venkatesh Babu, Srikrishna Karanam
To mitigate these issues, we introduce PartCraft, which enables image generation based on fine-grained part-level details specified for objects in the base text prompt.
no code implementations • 2 May 2024 • Anurag Kumar, Chinmay Bharti, Saikat Dutta, Srikrishna Karanam, Biplab Banerjee
Our proposed framework not only empowers the model to embrace novel classes with limited data, but also ensures the preservation of performance on base classes.
class-incremental learning
Few-Shot Class-Incremental Learning
+2
no code implementations • 7 Dec 2023 • Shubham Agarwal, Subrata Mitra, Sarthak Chakraborty, Srikrishna Karanam, Koyel Mukherjee, Shiv Saini
Text-to-image generation using diffusion models has seen explosive popularity owing to their ability in producing high quality images adhering to text prompts.
no code implementations • 20 Nov 2023 • Aishwarya Agarwal, Srikrishna Karanam, Tripti Shukla, Balaji Vasan Srinivasan
Another line of techniques expand the inversion space to learn multiple embeddings but they do this only along the layer dimension (e. g., one per layer of the DDPM model) or the timestep dimension (one for a set of timesteps in the denoising process), leading to suboptimal attribute disentanglement.
no code implementations • 1 Sep 2023 • K J Joseph, Prateksha Udhayanan, Tripti Shukla, Aishwarya Agarwal, Srikrishna Karanam, Koustava Goswami, Balaji Vasan Srinivasan
We hope our work would attract attention to this newly identified, pragmatic problem setting.
no code implementations • 31 Aug 2023 • Prateksha Udhayanan, Srikrishna Karanam, Balaji Vasan Srinivasan
To this end, our key novelty is a new gradient-attention-based learning objective that explicitly forces the model to focus on the local regions of interest being modified in each retrieval step.
no code implementations • 3 Jul 2023 • Koustava Goswami, Srikrishna Karanam, Prateksha Udhayanan, K J Joseph, Balaji Vasan Srinivasan
Our key innovations over earlier works include using local image features as part of the prompt learning process, and more crucially, learning to weight these prompts based on local features that are appropriate for the task at hand.
no code implementations • 26 Jun 2023 • Aishwarya Agarwal, Srikrishna Karanam, Balaji Vasan Srinivasan
Recent works in self-supervised learning have shown impressive results on single-object images, but they struggle to perform well on complex multi-object images as evidenced by their poor visual grounding.
no code implementations • ICCV 2023 • Aishwarya Agarwal, Srikrishna Karanam, K J Joseph, Apoorv Saxena, Koustava Goswami, Balaji Vasan Srinivasan
First, our attention segregation loss reduces the cross-attention overlap between attention maps of different concepts in the text prompt, thereby reducing the confusion/conflict among various concepts and the eventual capture of all concepts in the generated output.
no code implementations • 28 Feb 2023 • Prachi Singh, Srikrishna Karanam, Sumit Shekhar
We consider and propose a new problem of retrieving audio files relevant to multimodal design document inputs comprising both textual elements and visual imagery, e. g., birthday/greeting cards.
no code implementations • 10 Sep 2022 • Xuan Gong, Abhishek Sharma, Srikrishna Karanam, Ziyan Wu, Terrence Chen, David Doermann, Arun Innanje
Federated Learning (FL) is a machine learning paradigm where local nodes collaboratively train a central model while the training data remains decentralized.
no code implementations • 10 Sep 2022 • Xuan Gong, Meng Zheng, Benjamin Planche, Srikrishna Karanam, Terrence Chen, David Doermann, Ziyan Wu
However, on synthetic dense correspondence maps (i. e., IUV) few have been explored since the domain gap between synthetic training data and real testing data is hard to address for 2D dense representation.
no code implementations • 12 Jul 2022 • Qin Liu, Meng Zheng, Benjamin Planche, Srikrishna Karanam, Terrence Chen, Marc Niethammer, Ziyan Wu
The goal of click-based interactive image segmentation is to obtain precise object segmentation masks with limited user interaction, i. e., by a minimal number of user clicks.
no code implementations • CVPR 2022 • Hengtao Guo, Benjamin Planche, Meng Zheng, Srikrishna Karanam, Terrence Chen, Ziyan Wu
In order to obtain accurate target location information, clinicians have to either conduct frequent intraoperative scans, resulting in higher exposition of patients to radiations, or adopt proxy procedures (e. g., creating and using custom molds to keep patients in the exact same pose during both preoperative organ scanning and subsequent treatment.
1 code implementation • 23 Dec 2021 • Xi Ouyang, Srikrishna Karanam, Ziyan Wu, Terrence Chen, Jiayu Huo, Xiang Sean Zhou, Qian Wang, Jie-Zhi Cheng
However, doing this accurately will require a large amount of disease localization annotations by clinical experts, a task that is prohibitively expensive to accomplish for most applications.
no code implementations • 27 Jul 2021 • Runze Li, Srikrishna Karanam, Ren Li, Terrence Chen, Bir Bhanu, Ziyan Wu
We conduct a variety of experiments on standard video mesh recovery benchmark datasets such as Human3. 6M, MPI-INF-3DHP, and 3DPW, demonstrating the efficacy of our design of modeling local dynamics as well as establishing state-of-the-art results based on standard evaluation metrics.
Ranked #54 on
3D Human Pose Estimation
on 3DPW
no code implementations • ICCV 2021 • Abhishek Aich, Meng Zheng, Srikrishna Karanam, Terrence Chen, Amit K. Roy-Chowdhury, Ziyan Wu
To alleviate these problems, we propose Spatio-Temporal Representation Factorization (STRF), a flexible new computational unit that can be used in conjunction with most existing 3D convolutional neural network architectures for re-ID.
Ranked #2 on
Person Re-Identification
on DukeMTMC-VideoReID
no code implementations • 13 Jul 2021 • Ren Li, Meng Zheng, Srikrishna Karanam, Terrence Chen, Ziyan Wu
Next, we present a simple baseline to address this problem that is scalable and can be easily used in conjunction with existing algorithms to improve their performance.
Ranked #1 on
3D Human Shape Estimation
on SSP-3D
(PVE-T metric)
no code implementations • CVPR 2021 • Yunhao Ge, Yao Xiao, Zhi Xu, Meng Zheng, Srikrishna Karanam, Terrence Chen, Laurent Itti, Ziyan Wu
Despite substantial progress in applying neural networks (NN) to a wide variety of areas, they still largely suffer from a lack of transparency and interpretability.
no code implementations • ICCV 2021 • Xuan Gong, Abhishek Sharma, Srikrishna Karanam, Ziyan Wu, Terrence Chen, David Doermann, Arun Innanje
Such decentralized training naturally leads to issues of imbalanced or differing data distributions among the local models and challenges in fusing them into a central model.
no code implementations • 13 Aug 2020 • Meng Zheng, Srikrishna Karanam, Terrence Chen, Richard J. Radke, Ziyan Wu
We show that the resulting similarity models perform, and can be visually explained, better than the corresponding baseline models trained without these constraints.
no code implementations • ECCV 2020 • Georgios Georgakis, Ren Li, Srikrishna Karanam, Terrence Chen, Jana Kosecka, Ziyan Wu
In this work, we address this gap by proposing a new technique for regression of human parametric model that is explicitly informed by the known hierarchical structure, including joint interdependencies of the model.
no code implementations • 18 Nov 2019 • Meng Zheng, Srikrishna Karanam, Terrence Chen, Richard J. Radke, Ziyan Wu
While there has been substantial progress in learning suitable distance metrics, these techniques in general lack transparency and decision reasoning, i. e., explaining why the input set of images is similar or dissimilar.
no code implementations • 18 Nov 2019 • Ren Li, Changjiang Cai, Georgios Georgakis, Srikrishna Karanam, Terrence Chen, Ziyan Wu
We consider the problem of human pose estimation.
2 code implementations • CVPR 2020 • Wenqian Liu, Runze Li, Meng Zheng, Srikrishna Karanam, Ziyan Wu, Bir Bhanu, Richard J. Radke, Octavia Camps
We present methods to generate visual attention from the learned latent space, and also demonstrate such attention explanations serve more than just explaining VAE predictions.
no code implementations • NeurIPS 2019 • Benjamin Planche, Xuejian Rong, Ziyan Wu, Srikrishna Karanam, Harald Kosch, YingLi Tian, Jan Ernst, Andreas Hutter
We present a method to incrementally generate complete 2D or 3D scenes with the following properties: (a) it is globally consistent at each step according to a learned scene prior, (b) real observations of a scene can be incorporated while observing global consistency, (c) unobserved regions can be hallucinated locally in consistence with previous observations, hallucinations and global priors, and (d) hallucinations are statistical in nature, i. e., different scenes can be generated from the same observations.
1 code implementation • ICCV 2019 • Lezi Wang, Ziyan Wu, Srikrishna Karanam, Kuan-Chuan Peng, Rajat Vikram Singh, Bo Liu, Dimitris N. Metaxas
Recent developments in gradient-based attention modeling have seen attention maps emerge as a powerful tool for interpreting convolutional neural networks.
no code implementations • CVPR 2019 • Meng Zheng, Srikrishna Karanam, Ziyan Wu, Richard J. Radke
We propose a new deep architecture for person re-identification (re-id).
1 code implementation • ICCV 2019 • Georgios Georgakis, Srikrishna Karanam, Ziyan Wu, Jana Kosecka
In this paper, we solve this key problem of existing methods requiring expensive 3D pose annotations by proposing a new method that matches RGB images to CAD models for object pose estimation.
no code implementations • 16 Aug 2018 • Meng Zheng, Srikrishna Karanam, Richard J. Radke
Designing real-world person re-identification (re-id) systems requires attention to operational aspects not typically considered in academic research.
no code implementations • CVPR 2018 • Georgios Georgakis, Srikrishna Karanam, Ziyan Wu, Jan Ernst, Jana Kosecka
Finding correspondences between images or 3D scans is at the heart of many computer vision and image retrieval applications and is often enabled by matching local keypoint descriptors.
no code implementations • CVPR 2018 • Yunye Gong, Srikrishna Karanam, Ziyan Wu, Kuan-Chuan Peng, Jan Ernst, Peter C. Doerschuk
Compositionality of semantic concepts in image synthesis and analysis is appealing as it can help in decomposing known and generatively recomposing unknown data.
no code implementations • 2 Jun 2017 • Srikrishna Karanam, Eric Lam, Richard J. Radke
Designing useful person re-identification systems for real-world applications requires attention to operational aspects not typically considered in academic research.
2 code implementations • 31 May 2016 • Srikrishna Karanam, Mengran Gou, Ziyan Wu, Angels Rates-Borras, Octavia Camps, Richard J. Radke
To ensure a fair comparison, all of the approaches were implemented using a unified code library that includes 11 feature extraction algorithms and 22 metric learning and ranking techniques.
no code implementations • ICCV 2015 • Srikrishna Karanam, Yang Li, Richard J. Radke
This paper introduces a new approach to address the person re-identification problem in cameras with non-overlapping fields of view.