no code implementations • 20 Jun 2024 • Sachit Menon, Richard Zemel, Carl Vondrick
We introduce a simple method, whiteboard-of-thought prompting, to unlock the visual reasoning capabilities of multimodal large language models across modalities.
1 code implementation • CVPR 2024 • Sachit Menon, Ishan Misra, Rohit Girdhar
We introduce the new task of generating Illustrated Instructions, i. e., visual instructions customized to a user's needs.
1 code implementation • ICCV 2023 • Dídac Surís, Sachit Menon, Carl Vondrick
Answering visual queries is a complex task that requires both visual processing and reasoning.
Ranked #15 on Zero-Shot Video Question Answer on NExT-QA
1 code implementation • 26 Jan 2023 • Scott Geng, Revant Teotia, Purva Tendulkar, Sachit Menon, Carl Vondrick
We introduce a video framework for modeling the association between verbal and non-verbal communication during dyadic conversation.
no code implementations • CVPR 2023 • Ruoshi Liu, Sachit Menon, Chengzhi Mao, Dennis Park, Simon Stent, Carl Vondrick
Experiments and visualizations show that the method is able to generate multiple possible solutions that are consistent with the observation of the shadow.
1 code implementation • CVPR 2023 • Chengzhi Mao, Revant Teotia, Amrutha Sundar, Sachit Menon, Junfeng Yang, Xin Wang, Carl Vondrick
We propose a ``doubly right'' object recognition benchmark, where the metric requires the model to simultaneously produce both the right labels as well as the right rationales.
no code implementations • 8 Dec 2022 • Sachit Menon, Ishaan Preetam Chandratreya, Carl Vondrick
Incidental supervision from language has become a popular approach for learning generic visual representations that can be prompted to perform many recognition tasks in computer vision.
3 code implementations • 13 Oct 2022 • Sachit Menon, Carl Vondrick
By basing decisions on these descriptors, we can provide additional cues that encourage using the features we want to be used.
no code implementations • 19 Jul 2022 • Sachit Menon, David Blei, Carl Vondrick
Variational autoencoders (VAEs) suffer from posterior collapse, where the powerful neural networks used for modeling and inference optimize the objective without meaningfully using the latent representation.
no code implementations • 17 Jun 2022 • Ruoshi Liu, Sachit Menon, Chengzhi Mao, Dennis Park, Simon Stent, Carl Vondrick
Experiments and visualizations show that the method is able to generate multiple possible solutions that are consistent with the observation of the shadow.
16 code implementations • CVPR 2020 • Sachit Menon, Alexandru Damian, Shijia Hu, Nikhil Ravi, Cynthia Rudin
We present an algorithm addressing this problem, PULSE (Photo Upsampling via Latent Space Exploration), which generates high-resolution, realistic images at resolutions previously unseen in the literature.
Ranked #10 on Image Super-Resolution on FFHQ 256 x 256 - 4x upscaling (PSNR metric)
1 code implementation • 9 May 2018 • Yijie Bei, Alex Damian, Shijia Hu, Sachit Menon, Nikhil Ravi, Cynthia Rudin
This work identifies and addresses two important technical challenges in single-image super-resolution: (1) how to upsample an image without magnifying noise and (2) how to preserve large scale structure when upsampling.