no code implementations • 18 Feb 2024 • Tanzila Rahman, Shweta Mahajan, Hsin-Ying Lee, Jian Ren, Sergey Tulyakov, Leonid Sigal
We illustrate that such joint alternating refinement leads to the learning of better tokens for concepts and, as a bi-product, latent masks.
no code implementations • 19 Dec 2023 • Shweta Mahajan, Tanzila Rahman, Kwang Moo Yi, Leonid Sigal
Further, we leverage the findings that different timesteps of the diffusion process cater to different levels of detail in an image.
no code implementations • 3 Dec 2023 • Jeong-gi Kwak, Erqun Dong, Yuhe Jin, Hanseok Ko, Shweta Mahajan, Kwang Moo Yi
Thus, to perform novel-view synthesis, we create a smooth camera trajectory to the target view that we wish to render, and denoise using both a view-conditioned diffusion model and a video diffusion model.
1 code implementation • 29 Nov 2023 • Eric Hedlin, Gopal Sharma, Shweta Mahajan, Xingzhe He, Hossam Isack, Abhishek Kar Helge Rhodin, Andrea Tagliasacchi, Kwang Moo Yi
Unsupervised learning of keypoints and landmarks has seen significant progress with the help of modern neural network architectures, but performance is yet to match the supervised counterpart, making their practicability questionable.
Ranked #1 on Unsupervised Human Pose Estimation on Tai-Chi-HD
1 code implementation • NeurIPS 2023 • Eric Hedlin, Gopal Sharma, Shweta Mahajan, Hossam Isack, Abhishek Kar, Andrea Tagliasacchi, Kwang Moo Yi
Text-to-image diffusion models are now capable of generating images that are often indistinguishable from real images.
Ranked #1 on Semantic correspondence on CUB-200-2011
1 code implementation • CVPR 2023 • Tanzila Rahman, Hsin-Ying Lee, Jian Ren, Sergey Tulyakov, Shweta Mahajan, Leonid Sigal
Our experiments for story generation on the MUGEN, the PororoSV and the FlintstonesSV dataset show that our method not only outperforms prior state-of-the-art in generating frames with high visual quality, which are consistent with the story, but also models appropriate correspondences between the characters and the background.
1 code implementation • 3 May 2022 • Franz Klein, Shweta Mahajan, Stefan Roth
Stylized image captioning as presented in prior work aims to generate captions that reflect characteristics beyond a factual description of the scene composition, such as sentiments.
1 code implementation • ICCV 2021 • Shweta Mahajan, Stefan Roth
Autoregressive models are a class of exact inference approaches with highly flexible functional forms, yielding state-of-the-art density estimates for natural images.
1 code implementation • NeurIPS 2020 • Shweta Mahajan, Stefan Roth
Our framework not only enables diverse captioning through context-based pseudo supervision, but extends this to images with novel objects and without paired captions in the training data.
1 code implementation • CVPR 2020 • Shweta Mahajan, Apratim Bhattacharyya, Mario Fritz, Bernt Schiele, Stefan Roth
Flow-based generative models are an important class of exact inference models that admit efficient inference and sampling for image synthesis.
1 code implementation • ICLR 2020 • Shweta Mahajan, Iryna Gurevych, Stefan Roth
Therefore, we propose a novel semi-supervised framework, which models shared information between domains and domain-specific information separately.
no code implementations • 14 Sep 2019 • Shweta Mahajan, Teresa Botschen, Iryna Gurevych, Stefan Roth
One of the key challenges in learning joint embeddings of multiple modalities, e. g. of images and text, is to ensure coherent cross-modal semantics that generalize across datasets.