Search Results for author: Carl Vondrick

Found 84 papers, 45 papers with code

There’s a Time and Place for Reasoning Beyond the Image

1 code implementation ACL 2022 Xingyu Fu, Ben Zhou, Ishaan Chandratreya, Carl Vondrick, Dan Roth

Images are often more significant than only the pixels to human eyes, as we can infer, associate, and reason with contextual information from other sources to establish a more complete picture.

Image Clustering

GES: Generalized Exponential Splatting for Efficient Radiance Field Rendering

1 code implementation15 Feb 2024 Abdullah Hamdi, Luke Melas-Kyriazi, Guocheng Qian, Jinjie Mai, Ruoshi Liu, Carl Vondrick, Bernard Ghanem, Andrea Vedaldi

With the aid of a frequency-modulated loss, GES achieves competitive performance in novel-view synthesis benchmarks while requiring less than half the memory storage of Gaussian Splatting and increasing the rendering speed by up to 39%.

3D Reconstruction Novel View Synthesis

pix2gestalt: Amodal Segmentation by Synthesizing Wholes

1 code implementation25 Jan 2024 Ege Ozguroglu, Ruoshi Liu, Dídac Surís, Dian Chen, Achal Dave, Pavel Tokmakov, Carl Vondrick

We introduce pix2gestalt, a framework for zero-shot amodal segmentation, which learns to estimate the shape and appearance of whole objects that are only partially visible behind occlusions.

3D Reconstruction Object Recognition +1

Raidar: geneRative AI Detection viA Rewriting

1 code implementation23 Jan 2024 Chengzhi Mao, Carl Vondrick, Hao Wang, Junfeng Yang

We find that large language models (LLMs) are more likely to modify human-written text than AI-generated text when tasked with rewriting.

Interpreting and Controlling Vision Foundation Models via Text Explanations

1 code implementation16 Oct 2023 Haozhe Chen, Junfeng Yang, Carl Vondrick, Chengzhi Mao

Large-scale pre-trained vision foundation models, such as CLIP, have become de facto backbones for various vision tasks.

Model Editing Visual Reasoning

SHIFT3D: Synthesizing Hard Inputs For Tricking 3D Detectors

no code implementations ICCV 2023 Hongge Chen, Zhao Chen, Gregory P. Meyer, Dennis Park, Carl Vondrick, Ashish Shrivastava, Yuning Chai

We present SHIFT3D, a differentiable pipeline for generating 3D shapes that are structurally plausible yet challenging to 3D object detectors.

Autonomous Driving Object

Sin3DM: Learning a Diffusion Model from a Single 3D Textured Shape

1 code implementation24 May 2023 Rundi Wu, Ruoshi Liu, Carl Vondrick, Changxi Zheng

Specifically, we encode the input 3D textured shape into triplane feature maps that represent the signed distance and texture fields of the input.

Denoising

Tracking through Containers and Occluders in the Wild

1 code implementation CVPR 2023 Basile Van Hoorick, Pavel Tokmakov, Simon Stent, Jie Li, Carl Vondrick

Tracking objects with persistence in cluttered and dynamic environments remains a difficult challenge for computer vision systems.

Visual Tracking

Humans as Light Bulbs: 3D Human Reconstruction from Thermal Reflection

no code implementations CVPR 2023 Ruoshi Liu, Carl Vondrick

The relatively hot temperature of the human body causes people to turn into long-wave infrared light sources.

3D Human Reconstruction Position

SURFSUP: Learning Fluid Simulation for Novel Surfaces

no code implementations ICCV 2023 Arjun Mani, Ishaan Preetam Chandratreya, Elliot Creager, Carl Vondrick, Richard Zemel

Modeling the mechanics of fluid in complex scenes is vital to applications in design, graphics, and robotics.

Zero-1-to-3: Zero-shot One Image to 3D Object

1 code implementation ICCV 2023 Ruoshi Liu, Rundi Wu, Basile Van Hoorick, Pavel Tokmakov, Sergey Zakharov, Carl Vondrick

We introduce Zero-1-to-3, a framework for changing the camera viewpoint of an object given just a single RGB image.

3D Reconstruction Image to 3D +3

Affective Faces for Goal-Driven Dyadic Communication

1 code implementation26 Jan 2023 Scott Geng, Revant Teotia, Purva Tendulkar, Sachit Menon, Carl Vondrick

We introduce a video framework for modeling the association between verbal and non-verbal communication during dyadic conversation.

What You Can Reconstruct From a Shadow

no code implementations CVPR 2023 Ruoshi Liu, Sachit Menon, Chengzhi Mao, Dennis Park, Simon Stent, Carl Vondrick

Experiments and visualizations show that the method is able to generate multiple possible solutions that are consistent with the observation of the shadow.

3D Reconstruction Object +1

Adversarially Robust Video Perception by Seeing Motion

no code implementations13 Dec 2022 Lingyu Zhang, Chengzhi Mao, Junfeng Yang, Carl Vondrick

Even under adaptive attacks where the adversary knows our defense, our algorithm is still effective.

Adversarial Robustness

Doubly Right Object Recognition: A Why Prompt for Visual Rationales

1 code implementation CVPR 2023 Chengzhi Mao, Revant Teotia, Amrutha Sundar, Sachit Menon, Junfeng Yang, Xin Wang, Carl Vondrick

We propose a ``doubly right'' object recognition benchmark, where the metric requires the model to simultaneously produce both the right labels as well as the right rationales.

Object Recognition

Robust Perception through Equivariance

1 code implementation12 Dec 2022 Chengzhi Mao, Lingyu Zhang, Abhishek Joshi, Junfeng Yang, Hao Wang, Carl Vondrick

In this paper, we introduce a framework that uses the dense intrinsic constraints in natural images to robustify inference.

Adversarial Robustness Instance Segmentation +2

Task Bias in Vision-Language Models

no code implementations8 Dec 2022 Sachit Menon, Ishaan Preetam Chandratreya, Carl Vondrick

Incidental supervision from language has become a popular approach for learning generic visual representations that can be prompted to perform many recognition tasks in computer vision.

Muscles in Action

no code implementations ICCV 2023 Mia Chiquier, Carl Vondrick

The dataset consists of 12. 5 hours of synchronized video and surface electromyography (sEMG) data of 10 subjects performing various exercises.

Private Multiparty Perception for Navigation

no code implementations2 Dec 2022 Hui Lu, Mia Chiquier, Carl Vondrick

We introduce a framework for navigating through cluttered environments by connecting multiple cameras together while simultaneously preserving privacy.

FLEX: Full-Body Grasping Without Full-Body Grasps

no code implementations CVPR 2023 Purva Tendulkar, Dídac Surís, Carl Vondrick

Towards this goal, we address the task of generating a virtual human -- hands and full body -- grasping everyday objects.

Visual Classification via Description from Large Language Models

2 code implementations13 Oct 2022 Sachit Menon, Carl Vondrick

By basing decisions on these descriptors, we can provide additional cues that encourage using the features we want to be used.

Classification Descriptive +1

Representing Spatial Trajectories as Distributions

no code implementations4 Oct 2022 Dídac Surís, Carl Vondrick

We introduce a representation learning framework for spatial trajectories.

Representation Learning

Forget-me-not! Contrastive Critics for Mitigating Posterior Collapse

no code implementations19 Jul 2022 Sachit Menon, David Blei, Carl Vondrick

Variational autoencoders (VAEs) suffer from posterior collapse, where the powerful neural networks used for modeling and inference optimize the objective without meaningfully using the latent representation.

Contrastive Learning Representation Learning

Landscape Learning for Neural Network Inversion

no code implementations ICCV 2023 Ruoshi Liu, Chengzhi Mao, Purva Tendulkar, Hao Wang, Carl Vondrick

Many machine learning methods operate by inverting a neural network at inference time, which has become a popular technique for solving inverse problems in computer vision, robotics, and graphics.

Adversarial Defense

Shadows Shed Light on 3D Objects

no code implementations17 Jun 2022 Ruoshi Liu, Sachit Menon, Chengzhi Mao, Dennis Park, Simon Stent, Carl Vondrick

Experiments and visualizations show that the method is able to generate multiple possible solutions that are consistent with the observation of the shadow.

3D Reconstruction Object +1

It's Time for Artistic Correspondence in Music and Video

no code implementations CVPR 2022 Didac Suris, Carl Vondrick, Bryan Russell, Justin Salamon

In order to capture the high-level concepts that are required to solve the task, we propose modeling the long-term temporal context of both the video and the music signals, using Transformer networks for each modality.

Retrieval

Revealing Occlusions with 4D Neural Fields

no code implementations CVPR 2022 Basile Van Hoorick, Purva Tendulka, Didac Suris, Dennis Park, Simon Stent, Carl Vondrick

For computer vision systems to operate in dynamic situations, they need to be able to represent and reason about object permanence.

Video Understanding

There is a Time and Place for Reasoning Beyond the Image

1 code implementation1 Mar 2022 Xingyu Fu, Ben Zhou, Ishaan Preetam Chandratreya, Carl Vondrick, Dan Roth

For example, in Figure 1, we can find a way to identify the news articles related to the picture through segment-wise understandings of the signs, the buildings, the crowds, and more.

Image Clustering World Knowledge

UnweaveNet: Unweaving Activity Stories

1 code implementation CVPR 2022 Will Price, Carl Vondrick, Dima Damen

Our lives can be seen as a complex weaving of activities; we switch from one activity to another, to maximise our achievements or in reaction to demands placed upon us.

Real-Time Neural Voice Camouflage

no code implementations ICLR 2022 Mia Chiquier, Chengzhi Mao, Carl Vondrick

Automatic speech recognition systems have created exciting possibilities for applications, however they also enable opportunities for systematic eavesdropping.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Full-Body Visual Self-Modeling of Robot Morphologies

1 code implementation11 Nov 2021 Boyuan Chen, Robert Kwiatkowski, Carl Vondrick, Hod Lipson

Internal computational models of physical bodies are fundamental to the ability of robots and animals alike to plan and control their actions.

Motion Planning

The Boombox: Visual Reconstruction from Acoustic Vibrations

1 code implementation17 May 2021 Boyuan Chen, Mia Chiquier, Hod Lipson, Carl Vondrick

Due to the many ways that robots use containers, we believe the box will have a number of applications in robotics.

Adversarial Attacks are Reversible with Natural Supervision

1 code implementation ICCV 2021 Chengzhi Mao, Mia Chiquier, Hao Wang, Junfeng Yang, Carl Vondrick

We find that images contain intrinsic structure that enables the reversal of many adversarial attacks.

Analogical Reasoning for Visually Grounded Compositional Generalization

no code implementations1 Jan 2021 Bo Wu, Haoyu Qin, Alireza Zareian, Carl Vondrick, Shih-Fu Chang

Children acquire language subconsciously by observing the surrounding world and listening to descriptions.

Language Acquisition

Generative Interventions for Causal Learning

1 code implementation CVPR 2021 Chengzhi Mao, Augustine Cha, Amogh Gupta, Hao Wang, Junfeng Yang, Carl Vondrick

We introduce a framework for learning robust visual representations that generalize to new viewpoints, backgrounds, and scene contexts.

Ranked #44 on Image Classification on ObjectNet (using extra training data)

Image Classification Out-of-Distribution Generalization

Globetrotter: Connecting Languages by Connecting Images

1 code implementation CVPR 2022 Dídac Surís, Dave Epstein, Carl Vondrick

Machine translation between many languages at once is highly challenging, since training with ground truth requires supervision between all language pairs, which is difficult to obtain.

Machine Translation Retrieval +2

Dissecting Image Crops

1 code implementation ICCV 2021 Basile Van Hoorick, Carl Vondrick

The elementary operation of cropping underpins nearly every computer vision system, ranging from data augmentation and translation invariance to computational photography and representation learning.

Data Augmentation Image Cropping +4

Listening to Sounds of Silence for Speech Denoising

1 code implementation NeurIPS 2020 Ruilin Xu, Rundi Wu, Yuko Ishiwaka, Carl Vondrick, Changxi Zheng

We introduce a deep learning model for speech denoising, a long-standing challenge in audio analysis arising in numerous applications.

Denoising Sentence +1

Analogical Reasoning for Visually Grounded Language Acquisition

no code implementations22 Jul 2020 Bo Wu, Haoyu Qin, Alireza Zareian, Carl Vondrick, Shih-Fu Chang

Children acquire language subconsciously by observing the surrounding world and listening to descriptions.

Language Acquisition

Multitask Learning Strengthens Adversarial Robustness

1 code implementation ECCV 2020 Chengzhi Mao, Amogh Gupta, Vikram Nitin, Baishakhi Ray, Shuran Song, Junfeng Yang, Carl Vondrick

Although deep networks achieve strong accuracy on a range of computer vision benchmarks, they remain vulnerable to adversarial attacks, where imperceptible input perturbations fool the network.

Adversarial Defense Adversarial Robustness

Learning Goals from Failure

no code implementations CVPR 2021 Dave Epstein, Carl Vondrick

We introduce a framework that predicts the goals behind observable human action in video.

Representation Learning

Oops! Predicting Unintentional Action in Video

1 code implementation CVPR 2020 Dave Epstein, Boyuan Chen, Carl Vondrick

We train a supervised neural network as a baseline and analyze its performance compared to human consistency on the tasks.

Visual Hide and Seek

no code implementations15 Oct 2019 Boyuan Chen, Shuran Song, Hod Lipson, Carl Vondrick

We train embodied agents to play Visual Hide and Seek where a prey must navigate in a simulated environment in order to avoid capture from a predator.

Navigate

Multi-level Multimodal Common Semantic Space for Image-Phrase Grounding

1 code implementation CVPR 2019 Hassan Akbari, Svebor Karaman, Surabhi Bhargava, Brian Chen, Carl Vondrick, Shih-Fu Chang

Following dedicated non-linear mappings for visual features at each level, word, and sentence embeddings, we obtain multiple instantiations of our common semantic space in which comparisons between any target text and the visual content is performed with cosine similarity.

Language Modelling Phrase Grounding +2

Actor-Centric Relation Network

1 code implementation ECCV 2018 Chen Sun, Abhinav Shrivastava, Carl Vondrick, Kevin Murphy, Rahul Sukthankar, Cordelia Schmid

A visualization of the learned relation features confirms that our approach is able to attend to the relevant relations for each action.

Action Classification Action Detection +5

The Sound of Pixels

2 code implementations ECCV 2018 Hang Zhao, Chuang Gan, Andrew Rouditchenko, Carl Vondrick, Josh Mcdermott, Antonio Torralba

We introduce PixelPlayer, a system that, by leveraging large amounts of unlabeled videos, learns to locate image regions which produce sounds and separate the input sounds into a set of components that represents the sound from each pixel.

Counterfactual Image Networks

no code implementations ICLR 2018 Deniz Oktay, Carl Vondrick, Antonio Torralba

However, when a layer is removed, the model learns to produce a different image that still looks natural to an adversary, which is possible by removing objects.

counterfactual Object +2

Towards Practical Verification of Machine Learning: The Case of Computer Vision Systems

no code implementations5 Dec 2017 Kexin Pei, Linjie Zhu, Yinzhi Cao, Junfeng Yang, Carl Vondrick, Suman Jana

Finally, we show that retraining using the safety violations detected by VeriVis can reduce the average number of violations up to 60. 2%.

BIG-bench Machine Learning Medical Diagnosis

Following Gaze in Video

no code implementations ICCV 2017 Adria Recasens, Carl Vondrick, Aditya Khosla, Antonio Torralba

In this paper, we present an approach for following gaze in video by predicting where a person (in the video) is looking even when the object is in a different frame.

See, Hear, and Read: Deep Aligned Representations

1 code implementation3 Jun 2017 Yusuf Aytar, Carl Vondrick, Antonio Torralba

We capitalize on large amounts of readily-available, synchronous data to learn a deep discriminative representations shared across three major natural modalities: vision, sound and language.

Cross-Modal Retrieval Representation Learning +1

AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions

8 code implementations CVPR 2018 Chunhui Gu, Chen Sun, David A. Ross, Carl Vondrick, Caroline Pantofaru, Yeqing Li, Sudheendra Vijayanarasimhan, George Toderici, Susanna Ricco, Rahul Sukthankar, Cordelia Schmid, Jitendra Malik

The AVA dataset densely annotates 80 atomic visual actions in 430 15-minute video clips, where actions are localized in space and time, resulting in 1. 58M action labels with multiple labels per person occurring frequently.

Actin Detection Action Detection +3

Following Gaze Across Views

no code implementations9 Dec 2016 Adrià Recasens, Carl Vondrick, Aditya Khosla, Antonio Torralba

In this paper, we present an approach for following gaze across views by predicting where a particular person is looking throughout a scene.

Who is Mistaken?

no code implementations4 Dec 2016 Benjamin Eysenbach, Carl Vondrick, Antonio Torralba

We then create a representation of characters' beliefs for two tasks in human action understanding: predicting who is mistaken, and when they are mistaken.

Action Understanding

Cross-Modal Scene Networks

no code implementations27 Oct 2016 Yusuf Aytar, Lluis Castrejon, Carl Vondrick, Hamed Pirsiavash, Antonio Torralba

Our experiments suggest that our scene representation can help transfer representations across modalities for retrieval.

Retrieval

SoundNet: Learning Sound Representations from Unlabeled Video

6 code implementations NeurIPS 2016 Yusuf Aytar, Carl Vondrick, Antonio Torralba

We learn rich natural sound representations by capitalizing on large amounts of unlabeled sound data collected in the wild.

General Classification

Generating Videos with Scene Dynamics

no code implementations NeurIPS 2016 Carl Vondrick, Hamed Pirsiavash, Antonio Torralba

We capitalize on large amounts of unlabeled video in order to learn a model of scene dynamics for both video recognition tasks (e. g. action classification) and video generation tasks (e. g. future prediction).

Action Classification Future prediction +7

Learning Aligned Cross-Modal Representations from Weakly Aligned Data

no code implementations CVPR 2016 Lluis Castrejon, Yusuf Aytar, Carl Vondrick, Hamed Pirsiavash, Antonio Torralba

Our experiments suggest that our scene representation can help transfer representations across modalities for retrieval.

Retrieval

Where are they looking?

no code implementations NeurIPS 2015 Adria Recasens, Aditya Khosla, Carl Vondrick, Antonio Torralba

Humans have the remarkable ability to follow the gaze of other people to identify what they are looking at.

Anticipating Visual Representations from Unlabeled Video

no code implementations CVPR 2016 Carl Vondrick, Hamed Pirsiavash, Antonio Torralba

The key idea behind our approach is that we can train deep networks to predict the visual representation of images in the future.

Do We Need More Training Data?

no code implementations5 Mar 2015 Xiangxin Zhu, Carl Vondrick, Charless Fowlkes, Deva Ramanan

Datasets for training object recognition systems are steadily increasing in size.

Object Recognition

Learning visual biases from human imagination

no code implementations NeurIPS 2015 Carl Vondrick, Hamed Pirsiavash, Aude Oliva, Antonio Torralba

Although the human visual system can recognize many concepts under challenging conditions, it still has some biases.

Object Recognition

Predicting Motivations of Actions by Leveraging Text

no code implementations CVPR 2016 Carl Vondrick, Deniz Oktay, Hamed Pirsiavash, Antonio Torralba

In this paper, we introduce the problem of predicting why a person has performed an action in images.

Inverting and Visualizing Features for Object Detection

no code implementations11 Dec 2012 Carl Vondrick, Aditya Khosla, Tomasz Malisiewicz, Antonio Torralba

By visualizing feature spaces, we can gain a more intuitive understanding of our detection systems.

Object object-detection +1

Cannot find the paper you are looking for? You can Submit a new open access paper.