1 code implementation • 10 Mar 2025 • Raphi Kang, Yue Song, Georgia Gkioxari, Pietro Perona
Contrastive Language-Image Pre-Training (CLIP) is a popular method for learning multimodal latent spaces with well-organized semantics.
no code implementations • 10 Feb 2025 • Damiano Marsili, Rohun Agrawal, Yisong Yue, Georgia Gkioxari
We show that our method outperforms prior zero-shot models for visual reasoning in 3D and empirically validate the effectiveness of our agentic framework for 3D spatial reasoning tasks.
no code implementations • 20 Nov 2024 • Ziqi Ma, Yisong Yue, Georgia Gkioxari
We study open-world part segmentation in 3D: segmenting any part in any object based on any text query.
no code implementations • 9 Apr 2024 • Jane Wu, Georgios Pavlakos, Georgia Gkioxari, Jitendra Malik
In order to obtain the best performing single frame model, we first present MCC-Hand-Object (MCC-HO), which jointly reconstructs hand and object geometry given a single RGB image and inferred 3D hand as inputs.
1 code implementation • 13 Mar 2024 • Connor Lee, Matthew Anderson, Nikhil Raganathan, Xingxing Zuo, Kevin Do, Georgia Gkioxari, Soon-Jo Chung
We present the first publicly-available RGB-thermal dataset designed for aerial robotics operating in natural environments.
1 code implementation • 26 Feb 2024 • Sabera Talukder, Yisong Yue, Georgia Gkioxari
We conclude that TOTEM matches or outperforms existing state-of-the-art models in both the canonical specialist setting (i. e., training one model on one domain) as well as the generalist setting (i. e., training a single model on many domains), which demonstrates the efficacy of tokenization for general time series analysis.
no code implementations • 18 Nov 2023 • Uriah Israel, Markus Marks, Rohit Dilip, Qilin Li, Morgan Schwartz, Elora Pradhan, Edward Pao, Shenyi Li, Alexander Pearson-Goulart, Pietro Perona, Georgia Gkioxari, Ross Barnowski, Yisong Yue, David Van Valen
Methods that have learned the general notion of "what is a cell" and can identify them across different domains of cellular imaging data have proven elusive.
no code implementations • ICCV 2023 • Yiming Xie, Huaizu Jiang, Georgia Gkioxari, Julian Straub
We present PARQ - a multi-view 3D object detector with transformer and pixel-aligned recurrent queries.
1 code implementation • CVPR 2023 • Chao-yuan Wu, Justin Johnson, Jitendra Malik, Christoph Feichtenhofer, Georgia Gkioxari
We introduce a simple framework that operates on 3D points of single objects or whole scenes coupled with category-agnostic large-scale training from diverse RGB-D videos.
1 code implementation • CVPR 2023 • Jennifer J. Sun, Lili Karashchuk, Amil Dravid, Serim Ryou, Sonia Fereidooni, John Tuthill, Aggelos Katsaggelos, Bingni W. Brunton, Georgia Gkioxari, Ann Kennedy, Yisong Yue, Pietro Perona
In this way, we discover keypoints without requiring manual supervision in videos of humans and rats, demonstrating the potential of 3D keypoint discovery for studying behavior.
1 code implementation • CVPR 2023 • Garrick Brazil, Abhinav Kumar, Julian Straub, Nikhila Ravi, Justin Johnson, Georgia Gkioxari
In 3D, existing benchmarks are small in size and approaches specialize in few object categories and specific domains, e. g. urban driving scenes.
3D Object Detection
3D Object Detection From Monocular Images
+3
no code implementations • CVPR 2022 • Georgia Gkioxari, Nikhila Ravi, Justin Johnson
A 3D scene consists of a set of objects, each with a shape and a layout giving their position in space.
no code implementations • 2 Dec 2021 • Shengyi Qian, Alexander Kirillov, Nikhila Ravi, Devendra Singh Chaplot, Justin Johnson, David F. Fouhey, Georgia Gkioxari
Humans can perceive scenes in 3D from a handful of 2D views.
1 code implementation • CVPR 2022 • Shubham Goel, Georgia Gkioxari, Jitendra Malik
We propose Differentiable Stereopsis, a multi-view stereo approach that reconstructs shape and texture from few input views and noisy cameras.
1 code implementation • 4 Feb 2021 • Gedeon Muhawenayo, Georgia Gkioxari
With our approach, we are able to compress a state-of-the-art object detection model by 30. 0% without a loss in performance.
3 code implementations • 16 Jul 2020 • Nikhila Ravi, Jeremy Reizenstein, David Novotny, Taylor Gordon, Wan-Yen Lo, Justin Johnson, Georgia Gkioxari
We address these challenges by introducing PyTorch3D, a library of modular, efficient, and differentiable operators for 3D deep learning.
1 code implementation • NeurIPS 2020 • Edward J. Smith, Roberto Calandra, Adriana Romero, Georgia Gkioxari, David Meger, Jitendra Malik, Michal Drozdzal
When a toddler is presented a new toy, their instinctual behaviour is to pick it upand inspect it with their hand and eyes in tandem, clearly searching over its surface to properly understand what they are playing with.
3 code implementations • CVPR 2020 • Olivia Wiles, Georgia Gkioxari, Richard Szeliski, Justin Johnson
Single image view synthesis allows for the generation of new views of a scene given a single input image.
1 code implementation • ICCV 2019 • Yi Wu, Yuxin Wu, Aviv Tamar, Stuart Russell, Georgia Gkioxari, Yuandong Tian
We introduce a new memory architecture, Bayesian Relational Memory (BRM), to improve the generalization ability for semantic visual navigation agents in unseen environments, where an agent is given a semantic target to navigate towards.
7 code implementations • ICCV 2019 • Georgia Gkioxari, Jitendra Malik, Justin Johnson
We propose a system that detects objects in real-world images and produces a triangle mesh giving the full 3D shape of each detected object.
Ranked #1 on
3D Shape Modeling
on Pix3D S2
1 code implementation • CVPR 2019 • Licheng Yu, Xinlei Chen, Georgia Gkioxari, Mohit Bansal, Tamara L. Berg, Dhruv Batra
To address this, we propose a modular architecture composed of a program generator, a controller, a navigator, and a VQA module.
no code implementations • CVPR 2019 • Erik Wijmans, Samyak Datta, Oleksandr Maksymets, Abhishek Das, Georgia Gkioxari, Stefan Lee, Irfan Essa, Devi Parikh, Dhruv Batra
To help bridge the gap between internet vision-style problems and the goal of vision for embodied perception we instantiate a large-scale navigation task -- Embodied Question Answering [1] in photo-realistic environments (Matterport 3D).
2 code implementations • 26 Oct 2018 • Abhishek Das, Georgia Gkioxari, Stefan Lee, Devi Parikh, Dhruv Batra
We use imitation learning to warm-start policies at each level of the hierarchy, dramatically increasing sample efficiency, followed by reinforcement learning.
no code implementations • ICLR 2019 • Yi Wu, Yuxin Wu, Aviv Tamar, Stuart Russell, Georgia Gkioxari, Yuandong Tian
Building deep reinforcement learning agents that can generalize and adapt to unseen environments remains a fundamental challenge for AI.
5 code implementations • ICLR 2018 • Yi Wu, Yuxin Wu, Georgia Gkioxari, Yuandong Tian
To generalize to unseen environments, an agent needs to be robust to low-level variations (e. g. color, texture, object changes), and also high-level variations (e. g. layout changes of the environment).
1 code implementation • CVPR 2018 • Rohit Girdhar, Georgia Gkioxari, Lorenzo Torresani, Manohar Paluri, Du Tran
This paper addresses the problem of estimating and tracking human body keypoints in complex, multi-person video.
Ranked #7 on
Keypoint Detection
on COCO test-challenge
4 code implementations • CVPR 2018 • Ilija Radosavovic, Piotr Dollár, Ross Girshick, Georgia Gkioxari, Kaiming He
We investigate omni-supervised learning, a special regime of semi-supervised learning in which the learner exploits all available labeled data plus internet-scale sources of unlabeled data.
4 code implementations • CVPR 2018 • Abhishek Das, Samyak Datta, Georgia Gkioxari, Stefan Lee, Devi Parikh, Dhruv Batra
We present a new AI task -- Embodied Question Answering (EmbodiedQA) -- where an agent is spawned at a random location in a 3D environment and asked a question ("What color is the car?").
2 code implementations • CVPR 2018 • Georgia Gkioxari, Ross Girshick, Piotr Dollár, Kaiming He
Our hypothesis is that the appearance of a person -- their pose, clothing, action -- is a powerful cue for localizing the objects they are interacting with.
Ranked #55 on
Human-Object Interaction Detection
on HICO-DET
175 code implementations • ICCV 2017 • Kaiming He, Georgia Gkioxari, Piotr Dollár, Ross Girshick
Our approach efficiently detects objects in an image while simultaneously generating a high-quality segmentation mask for each instance.
Ranked #1 on
Keypoint Detection
on COCO
no code implementations • 8 May 2016 • Georgia Gkioxari, Alexander Toshev, Navdeep Jaitly
In this model the output variables for a given input are predicted sequentially using neural networks.
2 code implementations • ICCV 2015 • Georgia Gkioxari, Ross Girshick, Jitendra Malik
In this work, we exploit the simple observation that actions are accompanied by contextual cues to build a strong action recognition system.
Ranked #4 on
Weakly Supervised Object Detection
on HICO-DET
no code implementations • ICCV 2015 • Georgia Gkioxari, Ross Girshick, Jitendra Malik
We investigate the importance of parts for the tasks of action and attribute classification.
1 code implementation • CVPR 2015 • Georgia Gkioxari, Jitendra Malik
We address the problem of action detection in videos.
Ranked #4 on
Action Detection
on UCF Sports
no code implementations • 19 Jun 2014 • Georgia Gkioxari, Bharath Hariharan, Ross Girshick, Jitendra Malik
We present convolutional neural networks for the tasks of keypoint (pose) prediction and action classification of people in unconstrained images.
no code implementations • CVPR 2014 • Georgia Gkioxari, Bharath Hariharan, Ross Girshick, Jitendra Malik
A k-poselet is a deformable part model (DPM) with k parts, where each of the parts is a poselet, aligned to a specific configuration of keypoints based on ground-truth annotations.
no code implementations • CVPR 2013 • Georgia Gkioxari, Pablo Arbelaez, Lubomir Bourdev, Jitendra Malik
We propose a novel approach for human pose estimation in real-world cluttered scenes, and focus on the challenging problem of predicting the pose of both arms for each person in the image.