Whiteboard-of-Thought: Thinking Step-by-Step Across Modalities

no code implementations20 Jun 2024 Sachit Menon, Richard Zemel, Carl Vondrick

We introduce a simple method, whiteboard-of-thought prompting, to unlock the visual reasoning capabilities of multimodal large language models across modalities.

Visual Reasoning

Generating Illustrated Instructions

1 code implementation CVPR 2024 Sachit Menon, Ishan Misra, Rohit Girdhar

We introduce the new task of generating Illustrated Instructions, i. e., visual instructions customized to a user's needs.

Text-to-Image Generation

Affective Faces for Goal-Driven Dyadic Communication

1 code implementation26 Jan 2023 Scott Geng, Revant Teotia, Purva Tendulkar, Sachit Menon, Carl Vondrick

We introduce a video framework for modeling the association between verbal and non-verbal communication during dyadic conversation.

What You Can Reconstruct From a Shadow

no code implementations CVPR 2023 Ruoshi Liu, Sachit Menon, Chengzhi Mao, Dennis Park, Simon Stent, Carl Vondrick

Experiments and visualizations show that the method is able to generate multiple possible solutions that are consistent with the observation of the shadow.

3D Reconstruction Object +1

Doubly Right Object Recognition: A Why Prompt for Visual Rationales

1 code implementation CVPR 2023 Chengzhi Mao, Revant Teotia, Amrutha Sundar, Sachit Menon, Junfeng Yang, Xin Wang, Carl Vondrick

We propose a ``doubly right'' object recognition benchmark, where the metric requires the model to simultaneously produce both the right labels as well as the right rationales.

Object Recognition

Task Bias in Vision-Language Models

no code implementations8 Dec 2022 Sachit Menon, Ishaan Preetam Chandratreya, Carl Vondrick

Incidental supervision from language has become a popular approach for learning generic visual representations that can be prompted to perform many recognition tasks in computer vision.

Visual Classification via Description from Large Language Models

3 code implementations13 Oct 2022 Sachit Menon, Carl Vondrick

By basing decisions on these descriptors, we can provide additional cues that encourage using the features we want to be used.

Classification Descriptive +1

Forget-me-not! Contrastive Critics for Mitigating Posterior Collapse

no code implementations19 Jul 2022 Sachit Menon, David Blei, Carl Vondrick

Variational autoencoders (VAEs) suffer from posterior collapse, where the powerful neural networks used for modeling and inference optimize the objective without meaningfully using the latent representation.

Contrastive Learning Representation Learning

Shadows Shed Light on 3D Objects

no code implementations17 Jun 2022 Ruoshi Liu, Sachit Menon, Chengzhi Mao, Dennis Park, Simon Stent, Carl Vondrick

Experiments and visualizations show that the method is able to generate multiple possible solutions that are consistent with the observation of the shadow.

3D Reconstruction Object +1

PULSE: Self-Supervised Photo Upsampling via Latent Space Exploration of Generative Models

16 code implementations CVPR 2020 Sachit Menon, Alexandru Damian, Shijia Hu, Nikhil Ravi, Cynthia Rudin

We present an algorithm addressing this problem, PULSE (Photo Upsampling via Latent Space Exploration), which generates high-resolution, realistic images at resolutions previously unseen in the literature.

Face Hallucination Hallucination +1

New Techniques for Preserving Global Structure and Denoising with Low Information Loss in Single-Image Super-Resolution

1 code implementation9 May 2018 Yijie Bei, Alex Damian, Shijia Hu, Sachit Menon, Nikhil Ravi, Cynthia Rudin

This work identifies and addresses two important technical challenges in single-image super-resolution: (1) how to upsample an image without magnifying noise and (2) how to preserve large scale structure when upsampling.

Denoising Image Super-Resolution

