no code implementations • 27 Nov 2024 • Maitreya Patel, Song Wen, Dimitris N. Metaxas, Yezhou Yang
In this work, we first develop a theoretical and empirical understanding of the vector field dynamics of RFMs in efficiently guiding the denoising trajectory.
1 code implementation • 7 Nov 2024 • Sheng Cheng, Maitreya Patel, Yezhou Yang
Despite advancements in text-to-image models, generating images that precisely align with textual descriptions remains challenging due to misalignment in training data.
no code implementations • 4 Nov 2024 • Maitreya Patel, Abhiram Kusumba, Sheng Cheng, Changhoon Kim, Tejas Gokhale, Chitta Baral, Yezhou Yang
However, the lack of compositional diversity in contemporary image-text datasets limits the compositional reasoning ability of CLIP.
1 code implementation • 17 Oct 2024 • Shailaja Keyur Sampat, Maitreya Patel, Yezhou Yang, Chitta Baral
An ability to learn about new objects from a small amount of visual data and produce convincing linguistic justification about the presence/absence of certain concepts (that collectively compose the object) in novel scenarios is an important characteristic of human cognition.
1 code implementation • 7 Feb 2024 • Maitreya Patel, Sangmin Jung, Chitta Baral, Yezhou Yang
While LDMs offer distinct advantages, P-T2I methods' reliance on the latent space of these diffusion models significantly escalates resource demands, leading to inconsistent results and necessitating numerous iterations for a single desired image.
no code implementations • CVPR 2024 • Maitreya Patel, Changhoon Kim, Sheng Cheng, Chitta Baral, Yezhou Yang
The T2I prior model alone adds a billion parameters compared to the Latent Diffusion Models, which increases the computational and high-quality data requirements.
1 code implementation • 7 Jun 2023 • Maitreya Patel, Tejas Gokhale, Chitta Baral, Yezhou Yang
To quantify the ability of T2I models in learning and synthesizing novel visual concepts (a. k. a.
1 code implementation • CVPR 2024 • Changhoon Kim, Kyle Min, Maitreya Patel, Sheng Cheng, Yezhou Yang
This paper introduces a novel approach to model fingerprinting that assigns responsibility for the generated images, thereby serving as a potential countermeasure to model misuse.
1 code implementation • 7 Nov 2022 • Maitreya Patel, Tejas Gokhale, Chitta Baral, Yezhou Yang
Videos often capture objects, their visible properties, their motion, and the interactions between different objects.
Ranked #1 on
Counterfactual Planning
on CRIPP-VQA
no code implementations • 15 Jul 2022 • Shailaja Keyur Sampat, Maitreya Patel, Subhasish Das, Yezhou Yang, Chitta Baral
'Actions' play a vital role in how humans interact with the world and enable them to achieve desired goals.
10 code implementations • 16 Apr 2022 • Yizhong Wang, Swaroop Mishra, Pegah Alipoormolabashi, Yeganeh Kordi, Amirreza Mirzaei, Anjana Arunkumar, Arjun Ashok, Arut Selvan Dhanasekaran, Atharva Naik, David Stap, Eshaan Pathak, Giannis Karamanolakis, Haizhi Gary Lai, Ishan Purohit, Ishani Mondal, Jacob Anderson, Kirby Kuznia, Krima Doshi, Maitreya Patel, Kuntal Kumar Pal, Mehrad Moradshahi, Mihir Parmar, Mirali Purohit, Neeraj Varshney, Phani Rohitha Kaza, Pulkit Verma, Ravsehaj Singh Puri, Rushang Karia, Shailaja Keyur Sampat, Savan Doshi, Siddhartha Mishra, Sujan Reddy, Sumanta Patro, Tanay Dixit, Xudong Shen, Chitta Baral, Yejin Choi, Noah A. Smith, Hannaneh Hajishirzi, Daniel Khashabi
This large and diverse collection of tasks enables rigorous benchmarking of cross-task generalization under instructions -- training models to follow instructions on a subset of tasks and evaluating them on the remaining unseen ones.
1 code implementation • 18 Aug 2020 • Maitreya Patel, Mirali Purohit, Jui Shah, Hemant A. Patil
The CycleGAN-based method uses two different models, one for Mel Cepstral Coefficients (MCC) mapping, and another for F0 prediction, where F0 is highly dependent on the pre-trained model of MCC mapping.
1 code implementation • 25 Sep 2019 • Maitreya Patel, Mirali Purohit, Mihir Parmar, Nirmesh J. Shah, Hemant A. Patil
In this paper, we propose a novel style transfer architecture, which can also be extended to generate voices even for target speakers whose data were not used in the training (i. e., case of zero-shot learning).
no code implementations • 24 Oct 2018 • Maitreya Patel, Anery Patel, Dr. Ranendu Ghosh
Short-term rainfall forecasting, also known as precipitation nowcasting has become a potentially fundamental technology impacting significant real-world applications ranging from flight safety, rainstorm alerts to farm irrigation timings.