Search Results for author: Shweta Mahajan

Found 12 papers, 8 papers with code

Visual Concept-driven Image Generation with Text-to-Image Diffusion Model

no code implementations18 Feb 2024 Tanzila Rahman, Shweta Mahajan, Hsin-Ying Lee, Jian Ren, Sergey Tulyakov, Leonid Sigal

We illustrate that such joint alternating refinement leads to the learning of better tokens for concepts and, as a bi-product, latent masks.

Image Generation

Prompting Hard or Hardly Prompting: Prompt Inversion for Text-to-Image Diffusion Models

no code implementations19 Dec 2023 Shweta Mahajan, Tanzila Rahman, Kwang Moo Yi, Leonid Sigal

Further, we leverage the findings that different timesteps of the diffusion process cater to different levels of detail in an image.

Image Generation Prompt Engineering

ViVid-1-to-3: Novel View Synthesis with Video Diffusion Models

no code implementations3 Dec 2023 Jeong-gi Kwak, Erqun Dong, Yuhe Jin, Hanseok Ko, Shweta Mahajan, Kwang Moo Yi

Thus, to perform novel-view synthesis, we create a smooth camera trajectory to the target view that we wish to render, and denoise using both a view-conditioned diffusion model and a video diffusion model.

Novel View Synthesis Object

Unsupervised Keypoints from Pretrained Diffusion Models

1 code implementation29 Nov 2023 Eric Hedlin, Gopal Sharma, Shweta Mahajan, Xingzhe He, Hossam Isack, Abhishek Kar Helge Rhodin, Andrea Tagliasacchi, Kwang Moo Yi

Unsupervised learning of keypoints and landmarks has seen significant progress with the help of modern neural network architectures, but performance is yet to match the supervised counterpart, making their practicability questionable.

Denoising Unsupervised Human Pose Estimation +1

Make-A-Story: Visual Memory Conditioned Consistent Story Generation

1 code implementation CVPR 2023 Tanzila Rahman, Hsin-Ying Lee, Jian Ren, Sergey Tulyakov, Shweta Mahajan, Leonid Sigal

Our experiments for story generation on the MUGEN, the PororoSV and the FlintstonesSV dataset show that our method not only outperforms prior state-of-the-art in generating frames with high visual quality, which are consistent with the story, but also models appropriate correspondences between the characters and the background.

Sentence Story Generation +1

Diverse Image Captioning with Grounded Style

1 code implementation3 May 2022 Franz Klein, Shweta Mahajan, Stefan Roth

Stylized image captioning as presented in prior work aims to generate captions that reflect characteristics beyond a factual description of the scene composition, such as sentiments.

Attribute Image Captioning

PixelPyramids: Exact Inference Models from Lossless Image Pyramids

1 code implementation ICCV 2021 Shweta Mahajan, Stefan Roth

Autoregressive models are a class of exact inference approaches with highly flexible functional forms, yielding state-of-the-art density estimates for natural images.

Density Estimation

Diverse Image Captioning with Context-Object Split Latent Spaces

1 code implementation NeurIPS 2020 Shweta Mahajan, Stefan Roth

Our framework not only enables diverse captioning through context-based pseudo supervision, but extends this to images with novel objects and without paired captions in the training data.

Image Captioning Object

Normalizing Flows with Multi-Scale Autoregressive Priors

1 code implementation CVPR 2020 Shweta Mahajan, Apratim Bhattacharyya, Mario Fritz, Bernt Schiele, Stefan Roth

Flow-based generative models are an important class of exact inference models that admit efficient inference and sampling for image synthesis.

Density Estimation Image Generation

Latent Normalizing Flows for Many-to-Many Cross-Domain Mappings

1 code implementation ICLR 2020 Shweta Mahajan, Iryna Gurevych, Stefan Roth

Therefore, we propose a novel semi-supervised framework, which models shared information between domains and domain-specific information separately.

Image Captioning Image Generation

Joint Wasserstein Autoencoders for Aligning Multimodal Embeddings

no code implementations14 Sep 2019 Shweta Mahajan, Teresa Botschen, Iryna Gurevych, Stefan Roth

One of the key challenges in learning joint embeddings of multiple modalities, e. g. of images and text, is to ensure coherent cross-modal semantics that generalize across datasets.

Cross-Modal Retrieval Retrieval

Cannot find the paper you are looking for? You can Submit a new open access paper.