Search Results for author: Shiry Ginosar

Found 16 papers, 5 papers with code

Gaussian Masked Autoencoders

no code implementations6 Jan 2025 Jathushan Rajasegaran, Xinlei Chen, Rulilong Li, Christoph Feichtenhofer, Jitendra Malik, Shiry Ginosar

Our approach, named Gaussian Masked Autoencoder, or GMAE, aims to learn semantic abstractions and spatial understanding jointly.

Edge Detection Representation Learning +2

Synergy and Synchrony in Couple Dances

no code implementations6 Sep 2024 Vongani Maluleke, Lea Müller, Jathushan Rajasegaran, Georgios Pavlakos, Shiry Ginosar, Angjoo Kanazawa, Jitendra Malik

Our contributions are a demonstration of the advantages of socially conditioned future motion prediction and an in-the-wild, couple dance video dataset to enable future research in this direction.

motion prediction

KiVA: Kid-inspired Visual Analogies for Testing Large Multimodal Models

1 code implementation25 Jul 2024 Eunice Yiu, Maan Qraitem, Charlie Wong, Anisa Noor Majhi, Yutong Bai, Shiry Ginosar, Alison Gopnik, Kate Saenko

This paper investigates visual analogical reasoning in large multimodal models (LMMs) compared to human adults and children.

Visual Analogies Visual Reasoning

Diffusion Models as Data Mining Tools

no code implementations20 Jul 2024 Ioannis Siglidis, Aleksander Holynski, Alexei A. Efros, Mathieu Aubry, Shiry Ginosar

Concretely, we show that after finetuning conditional diffusion models to synthesize images from a specific dataset, we can use these models to define a typicality measure on that dataset.

Image Generation

Pose Priors from Language Models

no code implementations6 May 2024 Sanjay Subramanian, Evonne Ng, Lea Müller, Dan Klein, Shiry Ginosar, Trevor Darrell

We present a zero-shot pose optimization method that enforces accurate physical contact constraints when estimating the 3D pose of humans.

Pose Estimation

Can Language Models Learn to Listen?

no code implementations ICCV 2023 Evonne Ng, Sanjay Subramanian, Dan Klein, Angjoo Kanazawa, Trevor Darrell, Shiry Ginosar

We present a framework for generating appropriate facial responses from a listener in dyadic social interactions based on the speaker's words.

Language Modeling Language Modelling +1

Learning to Listen: Modeling Non-Deterministic Dyadic Facial Motion

no code implementations CVPR 2022 Evonne Ng, Hanbyul Joo, Liwen Hu, Hao Li, Trevor Darrell, Angjoo Kanazawa, Shiry Ginosar

We present a framework for modeling interactional communication in dyadic conversations: given multimodal inputs of a speaker, we autoregressively output multiple possibilities of corresponding listener motion.

Strumming to the Beat: Audio-Conditioned Contrastive Video Textures

no code implementations6 Apr 2021 Medhini Narasimhan, Shiry Ginosar, Andrew Owens, Alexei A. Efros, Trevor Darrell

We learn representations for video frames and frame-to-frame transition probabilities by fitting a video-specific model trained using contrastive learning.

Contrastive Learning Self-Supervised Learning +1

Contrastive Video Textures

no code implementations1 Jan 2021 Medhini Narasimhan, Shiry Ginosar, Andrew Owens, Alexei A Efros, Trevor Darrell

By randomly traversing edges with high transition probabilities, we generate diverse temporally smooth videos with novel sequences and transitions.

Contrastive Learning Video Generation

Learning to Factorize and Relight a City

no code implementations ECCV 2020 Andrew Liu, Shiry Ginosar, Tinghui Zhou, Alexei A. Efros, Noah Snavely

We propose a learning-based framework for disentangling outdoor scenes into temporally-varying illumination and permanent scene factors.

Intrinsic Image Decomposition

Body2Hands: Learning to Infer 3D Hands from Conversational Gesture Body Dynamics

1 code implementation CVPR 2021 Evonne Ng, Shiry Ginosar, Trevor Darrell, Hanbyul Joo

We demonstrate the efficacy of our method on hand gesture synthesis from body motion input, and as a strong body prior for single-view image-based 3D hand pose estimation.

3D Hand Pose Estimation

Everybody Dance Now

13 code implementations ICCV 2019 Caroline Chan, Shiry Ginosar, Tinghui Zhou, Alexei A. Efros

This paper presents a simple method for "do as I do" motion transfer: given a source video of a person dancing, we can transfer that performance to a novel (amateur) target after only a few minutes of the target subject performing standard moves.

Face Generation Image-to-Image Translation +1

Photographic home styles in Congress: a computer vision approach

no code implementations29 Nov 2016 L. Jason Anastasopoulos, Dhruvil Badani, Crystal Lee, Shiry Ginosar, Jake Williams

While members of Congress now routinely communicate with constituents using images on a variety of internet platforms, little is known about how images are used as a means of strategic political communication.

Image Manipulation

A Century of Portraits: A Visual Historical Record of American High School Yearbooks

2 code implementations9 Nov 2015 Shiry Ginosar, Kate Rakelly, Sarah Sachs, Brian Yin, Crystal Lee, Philipp Krahenbuhl, Alexei A. Efros

4) A new method for discovering and displaying the visual elements used by the CNN-based date-prediction model to date portraits, finding that they correspond to the tell-tale fashions of each era.

Cultural Vocal Bursts Intensity Prediction

Detecting People in Cubist Art

no code implementations22 Sep 2014 Shiry Ginosar, Daniel Haas, Timothy Brown, Jitendra Malik

Although the human visual system is surprisingly robust to extreme distortion when recognizing objects, most evaluations of computer object detection methods focus only on robustness to natural form deformations such as people's pose changes.

Object object-detection +1

Cannot find the paper you are looking for? You can Submit a new open access paper.