Search Results for author: Andrew Owens

Found 41 papers, 19 papers with code

Images that Sound: Composing Images and Sounds on a Single Canvas

no code implementations20 May 2024 Ziyang Chen, Daniel Geng, Andrew Owens

During the reverse process, we denoise noisy latents with both the audio and image diffusion models in parallel, resulting in a sample that is likely under both models.

Efficient Vision-Language Pre-training by Cluster Masking

1 code implementation CVPR 2024 Zihao Wei, Zixuan Pan, Andrew Owens

We propose a simple strategy for masking image patches during visual-language contrastive learning that improves the quality of the learned representations and the training speed.

Contrastive Learning

Tactile-Augmented Radiance Fields

1 code implementation CVPR 2024 Yiming Dou, Fengyu Yang, Yi Liu, Antonio Loquercio, Andrew Owens

Our approach makes use of two insights: (i) common vision-based touch sensors are built on ordinary cameras and thus can be registered to images using methods from multi-view geometry, and (ii) visually and structurally similar regions of a scene share the same tactile features.

Factorized Diffusion: Perceptual Illusions by Noise Decomposition

no code implementations17 Apr 2024 Daniel Geng, Inbum Park, Andrew Owens

And we explore a decomposition by a motion blur kernel, which produces images that change appearance under motion blurring.


Real Acoustic Fields: An Audio-Visual Room Acoustics Dataset and Benchmark

no code implementations CVPR 2024 Ziyang Chen, Israel D. Gebru, Christian Richardt, Anurag Kumar, William Laney, Andrew Owens, Alexander Richard

The dataset includes high-quality and densely captured room impulse response data paired with multi-view images, and precise 6DoF pose tracking data for sound emitters and listeners in the rooms.

Few-Shot Learning Pose Tracking +1

Motion Guidance: Diffusion-Based Image Editing with Differentiable Motion Estimators

no code implementations31 Jan 2024 Daniel Geng, Andrew Owens

Diffusion models are capable of generating impressive images conditioned on text descriptions, and extensions of these models allow users to edit images at a relatively coarse scale.

Optical Flow Estimation

Visual Anagrams: Generating Multi-View Optical Illusions with Diffusion Models

no code implementations CVPR 2024 Daniel Geng, Inbum Park, Andrew Owens

During the reverse diffusion process, we estimate the noise from different views of a noisy image, and then combine these noise estimates together and denoise the image.

Generating Visual Scenes from Touch

no code implementations ICCV 2023 Fengyu Yang, Jiacheng Zhang, Andrew Owens

An emerging line of work has sought to generate plausible imagery from touch.

Conditional Generation of Audio from Video via Foley Analogies

1 code implementation CVPR 2023 Yuexi Du, Ziyang Chen, Justin Salamon, Bryan Russell, Andrew Owens

Second, we propose a model for generating a soundtrack for a silent input video, given a user-supplied example that specifies what the video should "sound like".

Text2Room: Extracting Textured 3D Meshes from 2D Text-to-Image Models

1 code implementation ICCV 2023 Lukas Höllein, Ang Cao, Andrew Owens, Justin Johnson, Matthias Nießner

We present Text2Room, a method for generating room-scale textured 3D meshes from a given text prompt as input.

Text to 3D

Sound Localization from Motion: Jointly Learning Sound Direction and Camera Rotation

2 code implementations ICCV 2023 Ziyang Chen, Shengyi Qian, Andrew Owens

In this paper, we use these cues to solve a problem we call Sound Localization from Motion (SLfM): jointly estimating camera rotation and localizing sound sources.

Mix and Localize: Localizing Sound Sources in Mixtures

no code implementations CVPR 2022 Xixi Hu, Ziyang Chen, Andrew Owens

This task requires a model to both group a sound mixture into individual sources, and to associate them with a visual signal.

Touch and Go: Learning from Human-Collected Vision and Touch

no code implementations22 Nov 2022 Fengyu Yang, Chenyang Ma, Jiacheng Zhang, Jing Zhu, Wenzhen Yuan, Andrew Owens

The ability to associate touch with sight is essential for tasks that require physically interacting with objects in the world.

Image Stylization

Learning Visual Styles from Audio-Visual Associations

no code implementations10 May 2022 Tingle Li, Yichen Liu, Andrew Owens, Hang Zhao

Our model learns to manipulate the texture of a scene to match a sound, a problem we term audio-driven image stylization.

Image Stylization

Sound Localization by Self-Supervised Time Delay Estimation

1 code implementation26 Apr 2022 Ziyang Chen, David F. Fouhey, Andrew Owens

We adapt the contrastive random walk of Jabri et al. to learn a cycle-consistent representation from unlabeled stereo sounds, resulting in a model that performs on par with supervised methods on "in the wild" internet recordings.

Contrastive Learning Visual Tracking

Learning Pixel Trajectories with Multiscale Contrastive Random Walks

no code implementations CVPR 2022 Zhangxing Bian, Allan Jabri, Alexei A. Efros, Andrew Owens

A range of video modeling tasks, from optical flow to multiple object tracking, share the same fundamental challenge: establishing space-time correspondence.

Multiple Object Tracking Object +5

GANmouflage: 3D Object Nondetection with Texture Fields

no code implementations CVPR 2023 Rui Guo, Jasmine Collins, Oscar de Lima, Andrew Owens

Our model learns to camouflage a variety of object shapes from randomly sampled locations and viewpoints within the input scene, and is the first to address the problem of hiding complex object shapes.


Structure from Silence: Learning Scene Structure from Ambient Sound

1 code implementation10 Nov 2021 Ziyang Chen, Xixi Hu, Andrew Owens

From whirling ceiling fans to ticking clocks, the sounds that we hear subtly vary as we move through a scene.

Comparing Correspondences: Video Prediction with Correspondence-wise Losses

1 code implementation CVPR 2022 Daniel Geng, Max Hamilton, Andrew Owens

Image prediction methods often struggle on tasks that require changing the positions of objects, such as video prediction, producing blurry images that average over the many positions that objects might occupy.

Optical Flow Estimation Video Prediction

Strumming to the Beat: Audio-Conditioned Contrastive Video Textures

no code implementations6 Apr 2021 Medhini Narasimhan, Shiry Ginosar, Andrew Owens, Alexei A. Efros, Trevor Darrell

We learn representations for video frames and frame-to-frame transition probabilities by fitting a video-specific model trained using contrastive learning.

Contrastive Learning Self-Supervised Learning +1

Planar Surface Reconstruction from Sparse Views

1 code implementation ICCV 2021 Linyi Jin, Shengyi Qian, Andrew Owens, David F. Fouhey

The paper studies planar surface reconstruction of indoor scenes from two views with unknown camera poses.

Surface Reconstruction

Contrastive Video Textures

no code implementations1 Jan 2021 Medhini Narasimhan, Shiry Ginosar, Andrew Owens, Alexei A Efros, Trevor Darrell

By randomly traversing edges with high transition probabilities, we generate diverse temporally smooth videos with novel sequences and transitions.

Contrastive Learning Video Generation

CNN-generated images are surprisingly easy to spot... for now

4 code implementations CVPR 2020 Sheng-Yu Wang, Oliver Wang, Richard Zhang, Andrew Owens, Alexei A. Efros

In this work we ask whether it is possible to create a "universal" detector for telling apart real images from these generated by a CNN, regardless of architecture or dataset used.

Data Augmentation Image Generation +1

Detecting Photoshopped Faces by Scripting Photoshop

2 code implementations ICCV 2019 Sheng-Yu Wang, Oliver Wang, Andrew Owens, Richard Zhang, Alexei A. Efros

Most malicious photo manipulations are created using standard image editing tools, such as Adobe Photoshop.

Image Manipulation Detection

MoSculp: Interactive Visualization of Shape and Time

no code implementations14 Sep 2018 Xiuming Zhang, Tali Dekel, Tianfan Xue, Andrew Owens, Qiurui He, Jiajun Wu, Stefanie Mueller, William T. Freeman

We present a system that allows users to visualize complex human motion via 3D motion sculptures---a representation that conveys the 3D structure swept by a human body as it moves through space.

More Than a Feeling: Learning to Grasp and Regrasp using Vision and Touch

no code implementations28 May 2018 Roberto Calandra, Andrew Owens, Dinesh Jayaraman, Justin Lin, Wenzhen Yuan, Jitendra Malik, Edward H. Adelson, Sergey Levine

This model -- a deep, multimodal convolutional network -- predicts the outcome of a candidate grasp adjustment, and then executes a grasp by iteratively selecting the most promising actions.

Robotic Grasping

Fighting Fake News: Image Splice Detection via Learned Self-Consistency

3 code implementations ECCV 2018 Minyoung Huh, Andrew Liu, Andrew Owens, Alexei A. Efros

In this paper, we propose a learning algorithm for detecting visual image manipulations that is trained only using a large dataset of real photographs.

Image Forensics

Audio-Visual Scene Analysis with Self-Supervised Multisensory Features

1 code implementation ECCV 2018 Andrew Owens, Alexei A. Efros

The thud of a bouncing ball, the onset of speech as lips open -- when visual and audio events occur together, it suggests that there might be a common, underlying event that produced both signals.

Action Recognition Audio Source Separation +1

Learning Sight from Sound: Ambient Sound Provides Supervision for Visual Learning

no code implementations20 Dec 2017 Andrew Owens, Jiajun Wu, Josh H. McDermott, William T. Freeman, Antonio Torralba

The sound of crashing waves, the roar of fast-moving cars -- sound conveys important information about the objects in our surroundings.

The Feeling of Success: Does Touch Sensing Help Predict Grasp Outcomes?

1 code implementation16 Oct 2017 Roberto Calandra, Andrew Owens, Manu Upadhyaya, Wenzhen Yuan, Justin Lin, Edward H. Adelson, Sergey Levine

In this work, we investigate the question of whether touch sensing aids in predicting grasp outcomes within a multimodal sensing framework that combines vision and touch.

Industrial Robots Robotic Grasping

Ambient Sound Provides Supervision for Visual Learning

1 code implementation25 Aug 2016 Andrew Owens, Jiajun Wu, Josh H. McDermott, William T. Freeman, Antonio Torralba

We show that, through this process, the network learns a representation that conveys information about objects and scenes.

Object Recognition

Camouflaging an Object from Many Viewpoints

no code implementations CVPR 2014 Andrew Owens, Connelly Barnes, Alex Flint, Hanumant Singh, William Freeman

We address the problem of camouflaging a 3D object from the many viewpoints that one might see it from.


Cannot find the paper you are looking for? You can Submit a new open access paper.