no code implementations • 12 Mar 2024 • Adam Villaflor, Brian Yang, Huangyuan Su, Katerina Fragkiadaki, John Dolan, Jeff Schneider
Although these models have conventionally been evaluated for open-loop prediction, we show that they can be used to parameterize autoregressive closed-loop models without retraining.
1 code implementation • 16 Feb 2024 • Tsung-Wei Ke, Nikolaos Gkanatsios, Katerina Fragkiadaki
We marry diffusion policies and 3D scene representations for robot manipulation.
no code implementations • 9 Feb 2024 • Brian Yang, Huangyuan Su, Nikolaos Gkanatsios, Tsung-Wei Ke, Ayush Jain, Jeff Schneider, Katerina Fragkiadaki
Diffusion-ES samples trajectories during evolutionary search from a diffusion model and scores them using a black-box reward function.
1 code implementation • 4 Jan 2024 • Ayush Jain, Pushkal Katara, Nikolaos Gkanatsios, Adam W. Harley, Gabriel Sarch, Kriti Aggarwal, Vishrav Chaudhary, Katerina Fragkiadaki
The gap in performance between methods that consume posed images versus post-processed 3D point clouds has fueled the belief that 2D and 3D perception require distinct model architectures.
Ranked #1 on 3D Instance Segmentation on ScanNet200
1 code implementation • 27 Nov 2023 • Mihir Prabhudesai, Tsung-Wei Ke, Alexander C. Li, Deepak Pathak, Katerina Fragkiadaki
Our method, Diffusion-TTA, adapts pre-trained discriminative models such as image classifiers, segmenters and depth predictors, to each unlabelled example in the test set using generative feedback from a diffusion model.
no code implementations • 27 Oct 2023 • Pushkal Katara, Zhou Xian, Katerina Fragkiadaki
We propose Generation to Simulation (Gen2Sim), a method for scaling up robot skill learning in simulation by automating generation of 3D assets, task descriptions, task decompositions and reward functions using large pre-trained generative models of language and vision.
no code implementations • 23 Oct 2023 • Gabriel Sarch, Yue Wu, Michael J. Tarr, Katerina Fragkiadaki
Pre-trained and frozen large language models (LLMs) can effectively map simple scene rearrangement instructions to programs over a robot's visuomotor functions through appropriate few-shot example prompting.
no code implementations • 10 Oct 2023 • Wen-Hsuan Chu, Adam W. Harley, Pavel Tokmakov, Achal Dave, Leonidas Guibas, Katerina Fragkiadaki
This begs the question: can we re-purpose these large-scale pre-trained static image models for open-vocabulary video tracking?
1 code implementation • 5 Oct 2023 • Mihir Prabhudesai, Anirudh Goyal, Deepak Pathak, Katerina Fragkiadaki
Due to their unsupervised training, controlling their behavior in downstream tasks, such as maximizing human-perceived image quality, image-text alignment, or ethical image generation, is difficult.
2 code implementations • 30 Jun 2023 • Theophile Gervet, Zhou Xian, Nikolaos Gkanatsios, Katerina Fragkiadaki
3D perceptual representations are well suited for robot manipulation as they easily encode occlusions and simplify spatial reasoning.
Ranked #2 on Robot Manipulation on RLBench
no code implementations • 27 Apr 2023 • Nikolaos Gkanatsios, Mayank Singh, Zhaoyuan Fang, Shubham Tulsiani, Katerina Fragkiadaki
We present Analogical Networks, a model that encodes domain knowledge explicitly, in a collection of structured labelled 3D scenes, in addition to implicitly, as model parameters, and segments 3D object scenes with analogical reasoning: instead of mapping a scene to part segments directly, our model first retrieves related scenes from memory and their corresponding part structures, and then predicts analogous part structures for the input scene, via an end-to-end learnable modulation mechanism.
no code implementations • 27 Apr 2023 • Nikolaos Gkanatsios, Ayush Jain, Zhou Xian, Yunchu Zhang, Christopher Atkeson, Katerina Fragkiadaki
Language is compositional; an instruction can express multiple relation constraints to hold among objects in a scene that a robot is tasked to rearrange.
1 code implementation • 4 Mar 2023 • Zhou Xian, Bo Zhu, Zhenjia Xu, Hsiao-Yu Tung, Antonio Torralba, Katerina Fragkiadaki, Chuang Gan
We identify several challenges for fluid manipulation learning by evaluating a set of reinforcement learning and trajectory optimization methods on our platform.
no code implementations • 27 Oct 2022 • Xingyu Lin, Carl Qi, Yunchu Zhang, Zhiao Huang, Katerina Fragkiadaki, Yunzhu Li, Chuang Gan, David Held
Effective planning of long-horizon deformable object manipulation requires suitable abstractions at both the spatial and temporal levels.
1 code implementation • 21 Jul 2022 • Gabriel Sarch, Zhaoyuan Fang, Adam W. Harley, Paul Schydlo, Michael J. Tarr, Saurabh Gupta, Katerina Fragkiadaki
We introduce TIDEE, an embodied agent that tidies up a disordered scene based on learned commonsense object placement and room arrangement priors.
1 code implementation • 16 Jun 2022 • Adam W. Harley, Zhaoyuan Fang, Jie Li, Rares Ambrus, Katerina Fragkiadaki
Building 3D perception systems for autonomous vehicles that do not rely on high-density LiDAR is a critical research problem because of the expense of LiDAR systems compared to cameras and other sensors.
Autonomous Vehicles Bird's-Eye View Semantic Segmentation +1
1 code implementation • 8 Apr 2022 • Adam W. Harley, Zhaoyuan Fang, Katerina Fragkiadaki
In this paper, we revisit Sand and Teller's "particle video" approach, and study pixel tracking as a long-range motion estimation problem, where every pixel is described with a trajectory that locates it in multiple future frames.
1 code implementation • 21 Mar 2022 • Mihir Prabhudesai, Anirudh Goyal, Sujoy Paul, Sjoerd van Steenkiste, Mehdi S. M. Sajjadi, Gaurav Aggarwal, Thomas Kipf, Deepak Pathak, Katerina Fragkiadaki
In our work, we find evidence that these losses are insufficient for the task of scene decomposition, without also considering architectural inductive biases.
1 code implementation • 16 Dec 2021 • Ayush Jain, Nikolaos Gkanatsios, Ishita Mediratta, Katerina Fragkiadaki
We propose a language grounding model that attends on the referential utterance and on the object proposal pool computed from a pre-trained detector to decode referenced objects with a detection head, without selecting them from the pool.
no code implementations • 29 Sep 2021 • Ayush Jain, Nikolaos Gkanatsios, Ishita Mediratta, Katerina Fragkiadaki
Object detectors are typically trained on a fixed vocabulary of objects and attributes that is often too restrictive for open-domain language grounding, where the language utterance may refer to visual entities in various levels of abstraction, such as a cat, the leg of a cat, or the stain on the front leg of the chair.
1 code implementation • CVPR 2021 • Shamit Lal, Mihir Prabhudesai, Ishita Mediratta, Adam W. Harley, Katerina Fragkiadaki
This paper explores self-supervised learning of amodal 3D feature representations from RGB and RGB-D posed images and videos, agnostic to object and scene semantic content, and evaluates the resulting scene representations in the downstream tasks of visual correspondence, object tracking, and object detection.
no code implementations • CVPR 2021 • Adam W. Harley, Yiming Zuo, Jing Wen, Ayush Mangal, Shubhankar Potdar, Ritwick Chaudhry, Katerina Fragkiadaki
We propose an unsupervised method for detecting and tracking moving objects in 3D, in unlabelled RGB-D videos.
no code implementations • 17 Mar 2021 • Zhou Xian, Shamit Lal, Hsiao-Yu Tung, Emmanouil Antonios Platanios, Katerina Fragkiadaki
We propose HyperDynamics, a dynamics meta-learning framework that conditions on an agent's interactions with the environment and optionally its visual observations, and generates the parameters of neural dynamics models based on inferred properties of the dynamical system.
no code implementations • ICLR 2021 • Zhou Xian, Shamit Lal, Hsiao-Yu Tung, Emmanouil Antonios Platanios, Katerina Fragkiadaki
We propose HyperDynamics, a framework that conditions on an agent’s interactions with the environment and optionally its visual observations, and generates the parameters of neural dynamics models based on inferred properties of the dynamical system.
1 code implementation • 30 Nov 2020 • Zhaoyuan Fang, Ayush Jain, Gabriel Sarch, Adam W. Harley, Katerina Fragkiadaki
Experiments on both indoor and outdoor datasets show that (1) our method obtains high-quality 2D and 3D pseudo-labels from multi-view RGB-D data; (2) fine-tuning with these pseudo-labels improves the 2D detector significantly in the test environment; (3) training a 3D detector with our pseudo-labels outperforms a prior self-supervised method by a large margin; (4) given weak supervision, our method can generate better pseudo-labels for novel objects.
no code implementations • 12 Nov 2020 • Hsiao-Yu Fish Tung, Zhou Xian, Mihir Prabhudesai, Shamit Lal, Katerina Fragkiadaki
Object motion predictions are computed by a graph neural network that operates over the object features extracted from the 3D neural scene representation.
1 code implementation • ICLR 2021 • Mihir Prabhudesai, Shamit Lal, Darshan Patil, Hsiao-Yu Tung, Adam W Harley, Katerina Fragkiadaki
We present neural architectures that disentangle RGB-D images into objects' shapes and styles and a map of the background scene, and explore their applications for few-shot 3D object detection and few-shot concept classification.
no code implementations • 30 Oct 2020 • Mihir Prabhudesai, Shamit Lal, Hsiao-Yu Fish Tung, Adam W. Harley, Shubhankar Potdar, Katerina Fragkiadaki
We can compare the 3D feature maps of two objects by searching alignment across scales and 3D rotations, and, as a result of the operation, we can estimate pose and scale changes without the need for 3D pose annotations.
no code implementations • ECCV 2020 • Adam W. Harley, Shrinidhi K. Lakshmikanth, Paul Schydlo, Katerina Fragkiadaki
We propose to leverage multiview data of \textit{static points} in arbitrary scenes (static or dynamic), to learn a neural 3D mapping module which produces features that are correspondable across time.
1 code implementation • CVPR 2020 • Yihui He, Rui Yan, Katerina Fragkiadaki, Shoou-I Yu
The intuition is: given a 2D location p in the current view, we would like to first find its corresponding point p' in a neighboring view, and then combine the features at p' with the features at p, thus leading to a 3D-aware feature at p. Inspired by stereo matching, the epipolar transformer leverages epipolar constraints and feature matching to approximate the features at p'.
Ranked #1 on 3D Hand Pose Estimation on InterHand2.6M
1 code implementation • CVPR 2020 • Mihir Prabhudesai, Hsiao-Yu Fish Tung, Syed Ashar Javed, Maximilian Sieb, Adam W. Harley, Katerina Fragkiadaki
We propose associating language utterances to 3D visual abstractions of the scene they describe.
no code implementations • 11 Jul 2019 • Maximilian Sieb, Zhou Xian, Audrey Huang, Oliver Kroemer, Katerina Fragkiadaki
We cast visual imitation as a visual correspondence problem.
1 code implementation • ICLR 2020 • Adam W. Harley, Shrinidhi K. Lakshmikanth, Fangyu Li, Xian Zhou, Hsiao-Yu Fish Tung, Katerina Fragkiadaki
Predictive coding theories suggest that the brain learns by predicting observations at various levels of abstraction.
no code implementations • 11 Jan 2019 • Adam W. Harley, Shih-En Wei, Jason Saragih, Katerina Fragkiadaki
Cross-domain image-to-image translation should satisfy two requirements: (1) preserve the information that is common to both domains, and (2) generate convincing images covering variations that appear in the target domain.
no code implementations • CVPR 2019 • Hsiao-Yu Fish Tung, Ricson Cheng, Katerina Fragkiadaki
The proposed networks learn to "lift" and integrate 2D visual features over time into latent 3D feature maps of the scene.
1 code implementation • 20 Nov 2018 • Ricson Cheng, Arpit Agarwal, Katerina Fragkiadaki
We propose hand/eye con-trollers that learn to move the camera to keep the object within the field of viewand visible, in coordination to manipulating it to achieve the desired goal, e. g., pushing it to a target location.
1 code implementation • 20 Nov 2018 • Arpit Agarwal, Katharina Muelling, Katerina Fragkiadaki
We propose an exploration method that incorporates look-ahead search over basic learnt skills and their dynamics, and use it for reinforcement learning (RL) of manipulation policies .
no code implementations • NeurIPS 2018 • Ricson Cheng, Ziyan Wang, Katerina Fragkiadaki
We present recurrent geometry-aware neural networks that integrate visual information across multiple views of a scene into 3D latent feature tensors, while maintaining an one-to-one mapping between 3D physical locations in the world scene and latent feature locations.
no code implementations • CVPR 2018 • Hsiao-Yu Fish Tung, Adam W. Harley, Liang-Kang Huang, Katerina Fragkiadaki
Humans effortlessly "program" one another by communicating goals and desires in natural language.
no code implementations • 1 Jan 2018 • Chris Ying, Katerina Fragkiadaki
Current convolutional neural networks algorithms for video object tracking spend the same amount of computation for each object and video frame.
1 code implementation • NeurIPS 2017 • Hsiao-Yu Fish Tung, Hsiao-Wei Tung, Ersin Yumer, Katerina Fragkiadaki
In this work, we propose a learning based motion capture model for single camera input.
Ranked #2 on 3D Human Reconstruction on Surreal
no code implementations • ICCV 2017 • Hsiao-Yu Fish Tung, Adam W. Harley, William Seto, Katerina Fragkiadaki
Researchers have developed excellent feed-forward models that learn to map images to desired outputs, such as to the images' latent factors, or to other images, using supervised learning.
no code implementations • 5 May 2017 • Katerina Fragkiadaki, Jonathan Huang, Alex Alemi, Sudheendra Vijayanarasimhan, Susanna Ricco, Rahul Sukthankar
In this work, we present stochastic neural network architectures that handle such multimodality through stochasticity: future trajectories of objects, body joints or frames are represented as deep, non-linear transformations of random (as opposed to deterministic) variables.
no code implementations • 25 Apr 2017 • Sudheendra Vijayanarasimhan, Susanna Ricco, Cordelia Schmid, Rahul Sukthankar, Katerina Fragkiadaki
We propose SfM-Net, a geometry-aware neural network for motion estimation in videos that decomposes frame-to-frame pixel motion in terms of scene and object depth, camera motion and 3D object rotations and translations.
no code implementations • 23 Nov 2015 • Katerina Fragkiadaki, Pulkit Agrawal, Sergey Levine, Jitendra Malik
The ability to plan and execute goal specific actions in varied, unexpected settings is a central requirement of intelligent agents.
no code implementations • ICCV 2015 • Katerina Fragkiadaki, Sergey Levine, Panna Felsen, Jitendra Malik
We propose the Encoder-Recurrent-Decoder (ERD) model for recognition and prediction of human body pose in videos and motion capture.
Ranked #8 on Human Pose Forecasting on Human3.6M (MAR, walking, 1,000ms metric)
1 code implementation • CVPR 2016 • Joao Carreira, Pulkit Agrawal, Katerina Fragkiadaki, Jitendra Malik
Hierarchical feature extractors such as Convolutional Networks (ConvNets) have achieved impressive performance on a variety of classification tasks using purely feedforward processing.
Ranked #43 on Pose Estimation on MPII Human Pose
no code implementations • CVPR 2015 • Katerina Fragkiadaki, Pablo Arbelaez, Panna Felsen, Jitendra Malik
We segment moving objects in videos by ranking spatio-temporal segment proposals according to "moving objectness": how likely they are to contain a moving object.
no code implementations • NeurIPS 2014 • Katerina Fragkiadaki, Marta Salas, Pablo Arbelaez, Jitendra Malik
Furthermore, NRSfM needs to be robust to noise in both segmentation and tracking, e. g., drifting, segmentation ``leaking'', optical flow ``bleeding'' etc.
no code implementations • CVPR 2013 • Katerina Fragkiadaki, Han Hu, Jianbo Shi
The pose labeled segments and corresponding articulated joints are used to improve the motion flow fields by proposing kinematically constrained affine displacements on body parts.