no code implementations • 6 Jan 2025 • Jathushan Rajasegaran, Xinlei Chen, Rulilong Li, Christoph Feichtenhofer, Jitendra Malik, Shiry Ginosar
Our approach, named Gaussian Masked Autoencoder, or GMAE, aims to learn semantic abstractions and spatial understanding jointly.
no code implementations • 6 Sep 2024 • Vongani Maluleke, Lea Müller, Jathushan Rajasegaran, Georgios Pavlakos, Shiry Ginosar, Angjoo Kanazawa, Jitendra Malik
Our contributions are a demonstration of the advantages of socially conditioned future motion prediction and an in-the-wild, couple dance video dataset to enable future research in this direction.
1 code implementation • 25 Jul 2024 • Eunice Yiu, Maan Qraitem, Charlie Wong, Anisa Noor Majhi, Yutong Bai, Shiry Ginosar, Alison Gopnik, Kate Saenko
This paper investigates visual analogical reasoning in large multimodal models (LMMs) compared to human adults and children.
no code implementations • 20 Jul 2024 • Ioannis Siglidis, Aleksander Holynski, Alexei A. Efros, Mathieu Aubry, Shiry Ginosar
Concretely, we show that after finetuning conditional diffusion models to synthesize images from a specific dataset, we can use these models to define a typicality measure on that dataset.
no code implementations • 6 May 2024 • Sanjay Subramanian, Evonne Ng, Lea Müller, Dan Klein, Shiry Ginosar, Trevor Darrell
We present a zero-shot pose optimization method that enforces accurate physical contact constraints when estimating the 3D pose of humans.
no code implementations • ICCV 2023 • Evonne Ng, Sanjay Subramanian, Dan Klein, Angjoo Kanazawa, Trevor Darrell, Shiry Ginosar
We present a framework for generating appropriate facial responses from a listener in dyadic social interactions based on the speaker's words.
no code implementations • CVPR 2022 • Evonne Ng, Hanbyul Joo, Liwen Hu, Hao Li, Trevor Darrell, Angjoo Kanazawa, Shiry Ginosar
We present a framework for modeling interactional communication in dyadic conversations: given multimodal inputs of a speaker, we autoregressively output multiple possibilities of corresponding listener motion.
no code implementations • 6 Apr 2021 • Medhini Narasimhan, Shiry Ginosar, Andrew Owens, Alexei A. Efros, Trevor Darrell
We learn representations for video frames and frame-to-frame transition probabilities by fitting a video-specific model trained using contrastive learning.
no code implementations • 1 Jan 2021 • Medhini Narasimhan, Shiry Ginosar, Andrew Owens, Alexei A Efros, Trevor Darrell
By randomly traversing edges with high transition probabilities, we generate diverse temporally smooth videos with novel sequences and transitions.
no code implementations • ECCV 2020 • Andrew Liu, Shiry Ginosar, Tinghui Zhou, Alexei A. Efros, Noah Snavely
We propose a learning-based framework for disentangling outdoor scenes into temporally-varying illumination and permanent scene factors.
1 code implementation • CVPR 2021 • Evonne Ng, Shiry Ginosar, Trevor Darrell, Hanbyul Joo
We demonstrate the efficacy of our method on hand gesture synthesis from body motion input, and as a strong body prior for single-view image-based 3D hand pose estimation.
2 code implementations • CVPR 2019 • Shiry Ginosar, Amir Bar, Gefen Kohavi, Caroline Chan, Andrew Owens, Jitendra Malik
Specifically, we perform cross-modal translation from "in-the-wild'' monologue speech of a single speaker to their hand and arm motion.
Ranked #4 on Gesture Generation on BEAT
13 code implementations • ICCV 2019 • Caroline Chan, Shiry Ginosar, Tinghui Zhou, Alexei A. Efros
This paper presents a simple method for "do as I do" motion transfer: given a source video of a person dancing, we can transfer that performance to a novel (amateur) target after only a few minutes of the target subject performing standard moves.
no code implementations • 29 Nov 2016 • L. Jason Anastasopoulos, Dhruvil Badani, Crystal Lee, Shiry Ginosar, Jake Williams
While members of Congress now routinely communicate with constituents using images on a variety of internet platforms, little is known about how images are used as a means of strategic political communication.
2 code implementations • 9 Nov 2015 • Shiry Ginosar, Kate Rakelly, Sarah Sachs, Brian Yin, Crystal Lee, Philipp Krahenbuhl, Alexei A. Efros
4) A new method for discovering and displaying the visual elements used by the CNN-based date-prediction model to date portraits, finding that they correspond to the tell-tale fashions of each era.
no code implementations • 22 Sep 2014 • Shiry Ginosar, Daniel Haas, Timothy Brown, Jitendra Malik
Although the human visual system is surprisingly robust to extreme distortion when recognizing objects, most evaluations of computer object detection methods focus only on robustness to natural form deformations such as people's pose changes.