Search Results for author: Andrew Owens

Found 38 papers, 16 papers with code

Towards Understanding the Relation between Gestures and Language

no code implementations • COLING 2022 • Artem Abzaliev, Andrew Owens, Rada Mihalcea

In this paper, we explore the relation between gestures and language.

Paper
Add Code

Factorized Diffusion: Perceptual Illusions by Noise Decomposition

no code implementations • 17 Apr 2024 • Daniel Geng, Inbum Park, Andrew Owens

And we explore a decomposition by a motion blur kernel, which produces images that change appearance under motion blurring.

Denoising

Paper
Add Code

Real Acoustic Fields: An Audio-Visual Room Acoustics Dataset and Benchmark

no code implementations • 27 Mar 2024 • Ziyang Chen, Israel D. Gebru, Christian Richardt, Anurag Kumar, William Laney, Andrew Owens, Alexander Richard

The dataset includes high-quality and densely captured room impulse response data paired with multi-view images, and precise 6DoF pose tracking data for sound emitters and listeners in the rooms.

Few-Shot Learning Pose Tracking +1

Paper
Add Code

Binding Touch to Everything: Learning Unified Multimodal Tactile Representations

no code implementations • 31 Jan 2024 • Fengyu Yang, Chao Feng, Ziyang Chen, Hyoungseob Park, Daniel Wang, Yiming Dou, Ziyao Zeng, Xien Chen, Rit Gangopadhyay, Andrew Owens, Alex Wong

We introduce UniTouch, a unified tactile model for vision-based touch sensors connected to multiple modalities, including vision, language, and sound.

Question Answering Visual Question Answering (VQA)

Paper
Add Code

Motion Guidance: Diffusion-Based Image Editing with Differentiable Motion Estimators

no code implementations • 31 Jan 2024 • Daniel Geng, Andrew Owens

Diffusion models are capable of generating impressive images conditioned on text descriptions, and extensions of these models allow users to edit images at a relatively coarse scale.

Optical Flow Estimation

Paper
Add Code

Visual Anagrams: Generating Multi-View Optical Illusions with Diffusion Models

no code implementations • 29 Nov 2023 • Daniel Geng, Inbum Park, Andrew Owens

During the reverse diffusion process, we estimate the noise from different views of a noisy image, and then combine these noise estimates together and denoise the image.

Paper
Add Code

Generating Visual Scenes from Touch

no code implementations • ICCV 2023 • Fengyu Yang, Jiacheng Zhang, Andrew Owens

An emerging line of work has sought to generate plausible imagery from touch.

Paper
Add Code

Conditional Generation of Audio from Video via Foley Analogies

1 code implementation • CVPR 2023 • Yuexi Du, Ziyang Chen, Justin Salamon, Bryan Russell, Andrew Owens

Second, we propose a model for generating a soundtrack for a silent input video, given a user-supplied example that specifies what the video should "sound like".

Paper
Code

Sound to Visual Scene Generation by Audio-to-Visual Latent Alignment

no code implementations • CVPR 2023 • Kim Sung-Bin, Arda Senocak, Hyunwoo Ha, Andrew Owens, Tae-Hyun Oh

The key idea is to enrich the audio features with visual information by learning to align audio to visual latent space.

Scene Generation Scheduling

Paper
Add Code

Text2Room: Extracting Textured 3D Meshes from 2D Text-to-Image Models

1 code implementation • ICCV 2023 • Lukas Höllein, Ang Cao, Andrew Owens, Justin Johnson, Matthias Nießner

We present Text2Room, a method for generating room-scale textured 3D meshes from a given text prompt as input.

Text to 3D

970

Paper
Code

Sound Localization from Motion: Jointly Learning Sound Direction and Camera Rotation

2 code implementations • ICCV 2023 • Ziyang Chen, Shengyi Qian, Andrew Owens

In this paper, we use these cues to solve a problem we call Sound Localization from Motion (SLfM): jointly estimating camera rotation and localizing sound sources.

Paper
Code

EXIF as Language: Learning Cross-Modal Associations Between Images and Camera Metadata

no code implementations • CVPR 2023 • Chenhao Zheng, Ayush Shrivastava, Andrew Owens

We learn a visual representation that captures information about the camera that recorded a given photo.

Clustering Image Forensics

Paper
Add Code

Self-Supervised Video Forensics by Audio-Visual Anomaly Detection

no code implementations • CVPR 2023 • Chao Feng, Ziyang Chen, Andrew Owens

Manipulated videos often contain subtle inconsistencies between their visual and audio signals.

Ranked #3 on DeepFake Detection on FakeAVCeleb

Anomaly Detection DeepFake Detection +1

Paper
Add Code

Mix and Localize: Localizing Sound Sources in Mixtures

no code implementations • CVPR 2022 • Xixi Hu, Ziyang Chen, Andrew Owens

This task requires a model to both group a sound mixture into individual sources, and to associate them with a visual signal.

Paper
Add Code

Touch and Go: Learning from Human-Collected Vision and Touch

no code implementations • 22 Nov 2022 • Fengyu Yang, Chenyang Ma, Jiacheng Zhang, Jing Zhu, Wenzhen Yuan, Andrew Owens

The ability to associate touch with sight is essential for tasks that require physically interacting with objects in the world.

Image Stylization

Paper
Add Code

Learning Visual Styles from Audio-Visual Associations

no code implementations • 10 May 2022 • Tingle Li, Yichen Liu, Andrew Owens, Hang Zhao

Our model learns to manipulate the texture of a scene to match a sound, a problem we term audio-driven image stylization.

Image Stylization

Paper
Add Code

Sound Localization by Self-Supervised Time Delay Estimation

1 code implementation • 26 Apr 2022 • Ziyang Chen, David F. Fouhey, Andrew Owens

We adapt the contrastive random walk of Jabri et al. to learn a cycle-consistent representation from unlabeled stereo sounds, resulting in a model that performs on par with supervised methods on "in the wild" internet recordings.

Contrastive Learning Visual Tracking

Paper
Code

Learning Pixel Trajectories with Multiscale Contrastive Random Walks

no code implementations • CVPR 2022 • Zhangxing Bian, Allan Jabri, Alexei A. Efros, Andrew Owens

A range of video modeling tasks, from optical flow to multiple object tracking, share the same fundamental challenge: establishing space-time correspondence.

Multiple Object Tracking Object +5

Paper
Add Code

GANmouflage: 3D Object Nondetection with Texture Fields

no code implementations • CVPR 2023 • Rui Guo, Jasmine Collins, Oscar de Lima, Andrew Owens

Our model learns to camouflage a variety of object shapes from randomly sampled locations and viewpoints within the input scene, and is the first to address the problem of hiding complex object shapes.

Object

Paper
Add Code

Structure from Silence: Learning Scene Structure from Ambient Sound

1 code implementation • 10 Nov 2021 • Ziyang Chen, Xixi Hu, Andrew Owens

From whirling ceiling fans to ticking clocks, the sounds that we hear subtly vary as we move through a scene.

Paper
Code

Comparing Correspondences: Video Prediction with Correspondence-wise Losses

1 code implementation • CVPR 2022 • Daniel Geng, Max Hamilton, Andrew Owens

Image prediction methods often struggle on tasks that require changing the positions of objects, such as video prediction, producing blurry images that average over the many positions that objects might occupy.

Optical Flow Estimation Video Prediction

Paper
Code

Strumming to the Beat: Audio-Conditioned Contrastive Video Textures

no code implementations • 6 Apr 2021 • Medhini Narasimhan, Shiry Ginosar, Andrew Owens, Alexei A. Efros, Trevor Darrell

We learn representations for video frames and frame-to-frame transition probabilities by fitting a video-specific model trained using contrastive learning.

Contrastive Learning Self-Supervised Learning +1

Paper
Add Code

Planar Surface Reconstruction from Sparse Views

1 code implementation • ICCV 2021 • Linyi Jin, Shengyi Qian, Andrew Owens, David F. Fouhey

The paper studies planar surface reconstruction of indoor scenes from two views with unknown camera poses.

Surface Reconstruction

106

Paper
Code

Contrastive Video Textures

no code implementations • 1 Jan 2021 • Medhini Narasimhan, Shiry Ginosar, Andrew Owens, Alexei A Efros, Trevor Darrell

By randomly traversing edges with high transition probabilities, we generate diverse temporally smooth videos with novel sequences and transitions.

Contrastive Learning Video Generation

Paper
Add Code

Self-Supervised Learning of Audio-Visual Objects from Video

1 code implementation • ECCV 2020 • Triantafyllos Afouras, Andrew Owens, Joon Son Chung, Andrew Zisserman

Our objective is to transform a video into a set of discrete audio-visual objects using self-supervised learning.

Face Detection Optical Flow Estimation +1

110

Paper
Code

Space-Time Correspondence as a Contrastive Random Walk

1 code implementation • NeurIPS 2020 • Allan Jabri, Andrew Owens, Alexei A. Efros

We cast correspondence as prediction of links in a space-time graph constructed from video.

Dense Pixel Correspondence Estimation Link Prediction +2

263

Paper
Code

CNN-generated images are surprisingly easy to spot... for now

4 code implementations • CVPR 2020 • Sheng-Yu Wang, Oliver Wang, Richard Zhang, Andrew Owens, Alexei A. Efros

In this work we ask whether it is possible to create a "universal" detector for telling apart real images from these generated by a CNN, regardless of architecture or dataset used.

Data Augmentation Image Generation +1

767

Paper
Code

Detecting Photoshopped Faces by Scripting Photoshop

2 code implementations • ICCV 2019 • Sheng-Yu Wang, Oliver Wang, Andrew Owens, Richard Zhang, Alexei A. Efros

Most malicious photo manipulations are created using standard image editing tools, such as Adobe Photoshop.

Image Manipulation Detection

1,563

Paper
Code

Learning Individual Styles of Conversational Gesture

2 code implementations • CVPR 2019 • Shiry Ginosar, Amir Bar, Gefen Kohavi, Caroline Chan, Andrew Owens, Jitendra Malik

Specifically, we perform cross-modal translation from "in-the-wild'' monologue speech of a single speaker to their hand and arm motion.

Ranked #4 on Gesture Generation on BEAT

Gesture Generation Speech-to-Gesture Translation +1

356

Paper
Code

MoSculp: Interactive Visualization of Shape and Time

no code implementations • 14 Sep 2018 • Xiuming Zhang, Tali Dekel, Tianfan Xue, Andrew Owens, Qiurui He, Jiajun Wu, Stefanie Mueller, William T. Freeman

We present a system that allows users to visualize complex human motion via 3D motion sculptures---a representation that conveys the 3D structure swept by a human body as it moves through space.

Paper
Add Code

More Than a Feeling: Learning to Grasp and Regrasp using Vision and Touch

no code implementations • 28 May 2018 • Roberto Calandra, Andrew Owens, Dinesh Jayaraman, Justin Lin, Wenzhen Yuan, Jitendra Malik, Edward H. Adelson, Sergey Levine

This model -- a deep, multimodal convolutional network -- predicts the outcome of a candidate grasp adjustment, and then executes a grasp by iteratively selecting the most promising actions.

Robotic Grasping

Paper
Add Code

Fighting Fake News: Image Splice Detection via Learned Self-Consistency

3 code implementations • ECCV 2018 • Minyoung Huh, Andrew Liu, Andrew Owens, Alexei A. Efros

In this paper, we propose a learning algorithm for detecting visual image manipulations that is trained only using a large dataset of real photographs.

Image Forensics

185

Paper
Code

Audio-Visual Scene Analysis with Self-Supervised Multisensory Features

1 code implementation • ECCV 2018 • Andrew Owens, Alexei A. Efros

The thud of a bouncing ball, the onset of speech as lips open -- when visual and audio events occur together, it suggests that there might be a common, underlying event that produced both signals.

Action Recognition Audio Source Separation +1

217

Paper
Code

Learning Sight from Sound: Ambient Sound Provides Supervision for Visual Learning

no code implementations • 20 Dec 2017 • Andrew Owens, Jiajun Wu, Josh H. McDermott, William T. Freeman, Antonio Torralba

The sound of crashing waves, the roar of fast-moving cars -- sound conveys important information about the objects in our surroundings.

Paper
Add Code

The Feeling of Success: Does Touch Sensing Help Predict Grasp Outcomes?

1 code implementation • 16 Oct 2017 • Roberto Calandra, Andrew Owens, Manu Upadhyaya, Wenzhen Yuan, Justin Lin, Edward H. Adelson, Sergey Levine

In this work, we investigate the question of whether touch sensing aids in predicting grasp outcomes within a multimodal sensing framework that combines vision and touch.

Industrial Robots Robotic Grasping