Search Results for author: Katerina Fragkiadaki

Found 50 papers, 21 papers with code

Tractable Joint Prediction and Planning over Discrete Behavior Modes for Urban Driving

no code implementations • 12 Mar 2024 • Adam Villaflor, Brian Yang, Huangyuan Su, Katerina Fragkiadaki, John Dolan, Jeff Schneider

Although these models have conventionally been evaluated for open-loop prediction, we show that they can be used to parameterize autoregressive closed-loop models without retraining.

Autonomous Driving Trajectory Forecasting

Paper
Add Code

3D Diffuser Actor: Policy Diffusion with 3D Scene Representations

1 code implementation • 16 Feb 2024 • Tsung-Wei Ke, Nikolaos Gkanatsios, Katerina Fragkiadaki

We marry diffusion policies and 3D scene representations for robot manipulation.

Denoising Robot Manipulation

110

Paper
Code

Diffusion-ES: Gradient-free Planning with Diffusion for Autonomous Driving and Zero-Shot Instruction Following

no code implementations • 9 Feb 2024 • Brian Yang, Huangyuan Su, Nikolaos Gkanatsios, Tsung-Wei Ke, Ayush Jain, Jeff Schneider, Katerina Fragkiadaki

Diffusion-ES samples trajectories during evolutionary search from a diffusion model and scores them using a black-box reward function.

Autonomous Driving Denoising +2

Paper
Add Code

ODIN: A Single Model for 2D and 3D Perception

1 code implementation • 4 Jan 2024 • Ayush Jain, Pushkal Katara, Nikolaos Gkanatsios, Adam W. Harley, Gabriel Sarch, Kriti Aggarwal, Vishrav Chaudhary, Katerina Fragkiadaki

The gap in performance between methods that consume posed images versus post-processed 3D point clouds has fueled the belief that 2D and 3D perception require distinct model architectures.

Ranked #1 on 3D Instance Segmentation on ScanNet200

3D Instance Segmentation 3D Semantic Segmentation

Paper
Code

Diffusion-TTA: Test-time Adaptation of Discriminative Models via Generative Feedback

1 code implementation • 27 Nov 2023 • Mihir Prabhudesai, Tsung-Wei Ke, Alexander C. Li, Deepak Pathak, Katerina Fragkiadaki

Our method, Diffusion-TTA, adapts pre-trained discriminative models such as image classifiers, segmenters and depth predictors, to each unlabelled example in the test set using generative feedback from a diffusion model.

Test-time Adaptation

Paper
Code

Gen2Sim: Scaling up Robot Learning in Simulation with Generative Models

no code implementations • 27 Oct 2023 • Pushkal Katara, Zhou Xian, Katerina Fragkiadaki

We propose Generation to Simulation (Gen2Sim), a method for scaling up robot skill learning in simulation by automating generation of 3D assets, task descriptions, task decompositions and reward functions using large pre-trained generative models of language and vision.

reinforcement-learning

Paper
Add Code

Open-Ended Instructable Embodied Agents with Memory-Augmented Large Language Models

no code implementations • 23 Oct 2023 • Gabriel Sarch, Yue Wu, Michael J. Tarr, Katerina Fragkiadaki

Pre-trained and frozen large language models (LLMs) can effectively map simple scene rearrangement instructions to programs over a robot's visuomotor functions through appropriate few-shot example prompting.

Prompt Engineering Retrieval

Paper
Add Code

Zero-Shot Open-Vocabulary Tracking with Large Pre-Trained Models

no code implementations • 10 Oct 2023 • Wen-Hsuan Chu, Adam W. Harley, Pavel Tokmakov, Achal Dave, Leonidas Guibas, Katerina Fragkiadaki

This begs the question: can we re-purpose these large-scale pre-trained static image models for open-vocabulary video tracking?

Object Object Tracking +5

Paper
Add Code

Aligning Text-to-Image Diffusion Models with Reward Backpropagation

1 code implementation • 5 Oct 2023 • Mihir Prabhudesai, Anirudh Goyal, Deepak Pathak, Katerina Fragkiadaki

Due to their unsupervised training, controlling their behavior in downstream tasks, such as maximizing human-perceived image quality, image-text alignment, or ethical image generation, is difficult.

Denoising Image Generation

182

Paper
Code

Act3D: 3D Feature Field Transformers for Multi-Task Robotic Manipulation

2 code implementations • 30 Jun 2023 • Theophile Gervet, Zhou Xian, Nikolaos Gkanatsios, Katerina Fragkiadaki

3D perceptual representations are well suited for robot manipulation as they easily encode occlusions and simplify spatial reasoning.

Ranked #2 on Robot Manipulation on RLBench

Action Detection Pose Prediction +1

110

Paper
Code

Analogy-Forming Transformers for Few-Shot 3D Parsing

no code implementations • 27 Apr 2023 • Nikolaos Gkanatsios, Mayank Singh, Zhaoyuan Fang, Shubham Tulsiani, Katerina Fragkiadaki

We present Analogical Networks, a model that encodes domain knowledge explicitly, in a collection of structured labelled 3D scenes, in addition to implicitly, as model parameters, and segments 3D object scenes with analogical reasoning: instead of mapping a scene to part segments directly, our model first retrieves related scenes from memory and their corresponding part structures, and then predicts analogous part structures for the input scene, via an end-to-end learnable modulation mechanism.

Few-Shot Learning

Paper
Add Code

Energy-based Models are Zero-Shot Planners for Compositional Scene Rearrangement

no code implementations • 27 Apr 2023 • Nikolaos Gkanatsios, Ayush Jain, Zhou Xian, Yunchu Zhang, Christopher Atkeson, Katerina Fragkiadaki

Language is compositional; an instruction can express multiple relation constraints to hold among objects in a scene that a robot is tasked to rearrange.

Language Modelling Large Language Model

Paper
Add Code

FluidLab: A Differentiable Environment for Benchmarking Complex Fluid Manipulation

1 code implementation • 4 Mar 2023 • Zhou Xian, Bo Zhu, Zhenjia Xu, Hsiao-Yu Tung, Antonio Torralba, Katerina Fragkiadaki, Chuang Gan

We identify several challenges for fluid manipulation learning by evaluating a set of reinforcement learning and trajectory optimization methods on our platform.

Benchmarking

133

Paper
Code

Planning with Spatial-Temporal Abstraction from Point Clouds for Deformable Object Manipulation

no code implementations • 27 Oct 2022 • Xingyu Lin, Carl Qi, Yunchu Zhang, Zhiao Huang, Katerina Fragkiadaki, Yunzhu Li, Chuang Gan, David Held

Effective planning of long-horizon deformable object manipulation requires suitable abstractions at both the spatial and temporal levels.

Deformable Object Manipulation

Paper
Add Code

TIDEE: Tidying Up Novel Rooms using Visuo-Semantic Commonsense Priors

1 code implementation • 21 Jul 2022 • Gabriel Sarch, Zhaoyuan Fang, Adam W. Harley, Paul Schydlo, Michael J. Tarr, Saurabh Gupta, Katerina Fragkiadaki

We introduce TIDEE, an embodied agent that tidies up a disordered scene based on learned commonsense object placement and room arrangement priors.

Object

Paper
Code

Simple-BEV: What Really Matters for Multi-Sensor BEV Perception?

1 code implementation • 16 Jun 2022 • Adam W. Harley, Zhaoyuan Fang, Jie Li, Rares Ambrus, Katerina Fragkiadaki

Building 3D perception systems for autonomous vehicles that do not rely on high-density LiDAR is a critical research problem because of the expense of LiDAR systems compared to cameras and other sensors.

Ranked #2 on Bird's-Eye View Semantic Segmentation on Lyft Level 5

Autonomous Vehicles Bird's-Eye View Semantic Segmentation +1

Paper
Code

Particle Video Revisited: Tracking Through Occlusions Using Point Trajectories

1 code implementation • 8 Apr 2022 • Adam W. Harley, Zhaoyuan Fang, Katerina Fragkiadaki

In this paper, we revisit Sand and Teller's "particle video" approach, and study pixel tracking as a long-range motion estimation problem, where every pixel is described with a trajectory that locates it in multiple future frames.

Motion Estimation Object Tracking +1

535

Paper
Code

Test-time Adaptation with Slot-Centric Models

1 code implementation • 21 Mar 2022 • Mihir Prabhudesai, Anirudh Goyal, Sujoy Paul, Sjoerd van Steenkiste, Mehdi S. M. Sajjadi, Gaurav Aggarwal, Thomas Kipf, Deepak Pathak, Katerina Fragkiadaki

In our work, we find evidence that these losses are insufficient for the task of scene decomposition, without also considering architectural inductive biases.

Image Classification Image Segmentation +7

Paper
Code

Bottom Up Top Down Detection Transformers for Language Grounding in Images and Point Clouds

1 code implementation • 16 Dec 2021 • Ayush Jain, Nikolaos Gkanatsios, Ishita Mediratta, Katerina Fragkiadaki

We propose a language grounding model that attends on the referential utterance and on the object proposal pool computed from a pre-trained detector to decode referenced objects with a detection head, without selecting them from the pool.

Object object-detection +2

Paper
Code

Language Modulated Detection and Detection Modulated Language Grounding in 2D and 3D Scenes

no code implementations • 29 Sep 2021 • Ayush Jain, Nikolaos Gkanatsios, Ishita Mediratta, Katerina Fragkiadaki

Object detectors are typically trained on a fixed vocabulary of objects and attributes that is often too restrictive for open-domain language grounding, where the language utterance may refer to visual entities in various levels of abstraction, such as a cat, the leg of a cat, or the stain on the front leg of the chair.

Object object-detection +1

Paper
Add Code

CoCoNets: Continuous Contrastive 3D Scene Representations

1 code implementation • CVPR 2021 • Shamit Lal, Mihir Prabhudesai, Ishita Mediratta, Adam W. Harley, Katerina Fragkiadaki

This paper explores self-supervised learning of amodal 3D feature representations from RGB and RGB-D posed images and videos, agnostic to object and scene semantic content, and evaluates the resulting scene representations in the downstream tasks of visual correspondence, object tracking, and object detection.

3D Object Detection Contrastive Learning +4

Paper
Code

Track, Check, Repeat: An EM Approach to Unsupervised Tracking

no code implementations • CVPR 2021 • Adam W. Harley, Yiming Zuo, Jing Wen, Ayush Mangal, Shubhankar Potdar, Ritwick Chaudhry, Katerina Fragkiadaki

We propose an unsupervised method for detecting and tracking moving objects in 3D, in unlabelled RGB-D videos.

Data Augmentation Object Discovery +1

Paper
Add Code

HyperDynamics: Meta-Learning Object and Agent Dynamics with Hypernetworks

no code implementations • 17 Mar 2021 • Zhou Xian, Shamit Lal, Hsiao-Yu Tung, Emmanouil Antonios Platanios, Katerina Fragkiadaki

We propose HyperDynamics, a dynamics meta-learning framework that conditions on an agent's interactions with the environment and optionally its visual observations, and generates the parameters of neural dynamics models based on inferred properties of the dynamical system.

Attribute Meta-Learning

Paper
Add Code

HyperDynamics: Generating Expert Dynamics Models by Observation

no code implementations • ICLR 2021 • Zhou Xian, Shamit Lal, Hsiao-Yu Tung, Emmanouil Antonios Platanios, Katerina Fragkiadaki

We propose HyperDynamics, a framework that conditions on an agent’s interactions with the environment and optionally its visual observations, and generates the parameters of neural dynamics models based on inferred properties of the dynamical system.

Attribute

Paper
Add Code

Move to See Better: Self-Improving Embodied Object Detection

1 code implementation • 30 Nov 2020 • Zhaoyuan Fang, Ayush Jain, Gabriel Sarch, Adam W. Harley, Katerina Fragkiadaki

Experiments on both indoor and outdoor datasets show that (1) our method obtains high-quality 2D and 3D pseudo-labels from multi-view RGB-D data; (2) fine-tuning with these pseudo-labels improves the 2D detector significantly in the test environment; (3) training a 3D detector with our pseudo-labels outperforms a prior self-supervised method by a large margin; (4) given weak supervision, our method can generate better pseudo-labels for novel objects.

Object object-detection +1

Paper
Code

3D-OES: Viewpoint-Invariant Object-Factorized Environment Simulators

no code implementations • 12 Nov 2020 • Hsiao-Yu Fish Tung, Zhou Xian, Mihir Prabhudesai, Shamit Lal, Katerina Fragkiadaki

Object motion predictions are computed by a graph neural network that operates over the object features extracted from the 3D neural scene representation.

Object

Paper
Add Code

Disentangling 3D Prototypical Networks For Few-Shot Concept Learning

1 code implementation • ICLR 2021 • Mihir Prabhudesai, Shamit Lal, Darshan Patil, Hsiao-Yu Tung, Adam W Harley, Katerina Fragkiadaki

We present neural architectures that disentangle RGB-D images into objects' shapes and styles and a map of the background scene, and explore their applications for few-shot 3D object detection and few-shot concept classification.

3D Object Detection Object +3

Paper
Code

3D Object Recognition By Corresponding and Quantizing Neural 3D Scene Representations

no code implementations • 30 Oct 2020 • Mihir Prabhudesai, Shamit Lal, Hsiao-Yu Fish Tung, Adam W. Harley, Shubhankar Potdar, Katerina Fragkiadaki

We can compare the 3D feature maps of two objects by searching alignment across scales and 3D rotations, and, as a result of the operation, we can estimate pose and scale changes without the need for 3D pose annotations.

3D Object Recognition Object +2

Paper
Add Code

Tracking Emerges by Looking Around Static Scenes, with Neural 3D Mapping

no code implementations • ECCV 2020 • Adam W. Harley, Shrinidhi K. Lakshmikanth, Paul Schydlo, Katerina Fragkiadaki

We propose to leverage multiview data of \textit{static points} in arbitrary scenes (static or dynamic), to learn a neural 3D mapping module which produces features that are correspondable across time.

3D Object Tracking Object +1

Paper
Add Code

Epipolar Transformers

1 code implementation • CVPR 2020 • Yihui He, Rui Yan, Katerina Fragkiadaki, Shoou-I Yu

The intuition is: given a 2D location p in the current view, we would like to first find its corresponding point p' in a neighboring view, and then combine the features at p' with the features at p, thus leading to a 3D-aware feature at p. Inspired by stereo matching, the epipolar transformer leverages epipolar constraints and feature matching to approximate the features at p'.

Ranked #1 on 3D Hand Pose Estimation on InterHand2.6M

2D Pose Estimation 3D Hand Pose Estimation +3

403

Paper
Code

Embodied Language Grounding with 3D Visual Feature Representations

1 code implementation • CVPR 2020 • Mihir Prabhudesai, Hsiao-Yu Fish Tung, Syed Ashar Javed, Maximilian Sieb, Adam W. Harley, Katerina Fragkiadaki

We propose associating language utterances to 3D visual abstractions of the scene they describe.

Object Detection

Paper
Code

Graph-Structured Visual Imitation

no code implementations • 11 Jul 2019 • Maximilian Sieb, Zhou Xian, Audrey Huang, Oliver Kroemer, Katerina Fragkiadaki

We cast visual imitation as a visual correspondence problem.

Paper
Add Code

Learning from Unlabelled Videos Using Contrastive Predictive Neural 3D Mapping

1 code implementation • ICLR 2020 • Adam W. Harley, Shrinidhi K. Lakshmikanth, Fangyu Li, Xian Zhou, Hsiao-Yu Fish Tung, Katerina Fragkiadaki

Predictive coding theories suggest that the brain learns by predicting observations at various levels of abstraction.

3D Object Detection object-detection +1

Paper
Code

Image Disentanglement and Uncooperative Re-Entanglement for High-Fidelity Image-to-Image Translation

no code implementations • 11 Jan 2019 • Adam W. Harley, Shih-En Wei, Jason Saragih, Katerina Fragkiadaki

Cross-domain image-to-image translation should satisfy two requirements: (1) preserve the information that is common to both domains, and (2) generate convincing images covering variations that appear in the target domain.

Disentanglement Image-to-Image Translation +1

Paper
Add Code

Learning Spatial Common Sense with Geometry-Aware Recurrent Networks

no code implementations • CVPR 2019 • Hsiao-Yu Fish Tung, Ricson Cheng, Katerina Fragkiadaki

The proposed networks learn to "lift" and integrate 2D visual features over time into latent 3D feature maps of the scene.

Common Sense Reasoning Representation Learning +1

Paper
Add Code

Reinforcement Learning of Active Vision for Manipulating Objects under Occlusions

1 code implementation • 20 Nov 2018 • Ricson Cheng, Arpit Agarwal, Katerina Fragkiadaki

We propose hand/eye con-trollers that learn to move the camera to keep the object within the field of viewand visible, in coordination to manipulating it to achieve the desired goal, e. g., pushing it to a target location.

Object reinforcement-learning +1

Paper
Code

Model Learning for Look-ahead Exploration in Continuous Control

1 code implementation • 20 Nov 2018 • Arpit Agarwal, Katharina Muelling, Katerina Fragkiadaki

We propose an exploration method that incorporates look-ahead search over basic learnt skills and their dynamics, and use it for reinforcement learning (RL) of manipulation policies .

Continuous Control Reinforcement Learning (RL)

Paper
Code

Geometry-Aware Recurrent Neural Networks for Active Visual Recognition

no code implementations • NeurIPS 2018 • Ricson Cheng, Ziyan Wang, Katerina Fragkiadaki

We present recurrent geometry-aware neural networks that integrate visual information across multiple views of a scene into 3D latent feature tensors, while maintaining an one-to-one mapping between 3D physical locations in the world scene and latent feature locations.

3D Reconstruction Object +3

Paper
Add Code

Reward Learning from Narrated Demonstrations

no code implementations • CVPR 2018 • Hsiao-Yu Fish Tung, Adam W. Harley, Liang-Kang Huang, Katerina Fragkiadaki

Humans effortlessly "program" one another by communicating goals and desires in natural language.

Paper
Add Code

Depth-Adaptive Computational Policies for Efficient Visual Tracking

no code implementations • 1 Jan 2018 • Chris Ying, Katerina Fragkiadaki

Current convolutional neural networks algorithms for video object tracking spend the same amount of computation for each object and video frame.

Object Video Object Tracking +1

Paper
Add Code

Self-supervised Learning of Motion Capture

1 code implementation • NeurIPS 2017 • Hsiao-Yu Fish Tung, Hsiao-Wei Tung, Ersin Yumer, Katerina Fragkiadaki

In this work, we propose a learning based motion capture model for single camera input.

Ranked #2 on 3D Human Reconstruction on Surreal

3D Human Reconstruction Optical Flow Estimation +2

169

Paper
Code

Adversarial Inverse Graphics Networks: Learning 2D-to-3D Lifting and Image-to-Image Translation from Unpaired Supervision

no code implementations • ICCV 2017 • Hsiao-Yu Fish Tung, Adam W. Harley, William Seto, Katerina Fragkiadaki

Researchers have developed excellent feed-forward models that learn to map images to desired outputs, such as to the images' latent factors, or to other images, using supervised learning.

3D Human Pose Estimation Image-to-Image Translation +2

Paper
Add Code

Motion Prediction Under Multimodality with Conditional Stochastic Networks

no code implementations • 5 May 2017 • Katerina Fragkiadaki, Jonathan Huang, Alex Alemi, Sudheendra Vijayanarasimhan, Susanna Ricco, Rahul Sukthankar

In this work, we present stochastic neural network architectures that handle such multimodality through stochasticity: future trajectories of objects, body joints or frames are represented as deep, non-linear transformations of random (as opposed to deterministic) variables.

motion prediction Optical Flow Estimation +2

Paper
Add Code

SfM-Net: Learning of Structure and Motion from Video

no code implementations • 25 Apr 2017 • Sudheendra Vijayanarasimhan, Susanna Ricco, Cordelia Schmid, Rahul Sukthankar, Katerina Fragkiadaki

We propose SfM-Net, a geometry-aware neural network for motion estimation in videos that decomposes frame-to-frame pixel motion in terms of scene and object depth, camera motion and 3D object rotations and translations.

Motion Estimation Object +1

Paper
Add Code

Learning Visual Predictive Models of Physics for Playing Billiards

no code implementations • 23 Nov 2015 • Katerina Fragkiadaki, Pulkit Agrawal, Sergey Levine, Jitendra Malik

The ability to plan and execute goal specific actions in varied, unexpected settings is a central requirement of intelligent agents.

Paper
Add Code

Recurrent Network Models for Human Dynamics

no code implementations • ICCV 2015 • Katerina Fragkiadaki, Sergey Levine, Panna Felsen, Jitendra Malik

We propose the Encoder-Recurrent-Decoder (ERD) model for recognition and prediction of human body pose in videos and motion capture.

Ranked #8 on Human Pose Forecasting on Human3.6M (MAR, walking, 1,000ms metric)

Human Dynamics Human Pose Forecasting +2

Paper
Add Code

Human Pose Estimation with Iterative Error Feedback

1 code implementation • CVPR 2016 • Joao Carreira, Pulkit Agrawal, Katerina Fragkiadaki, Jitendra Malik

Hierarchical feature extractors such as Convolutional Networks (ConvNets) have achieved impressive performance on a variety of classification tasks using purely feedforward processing.

Ranked #43 on Pose Estimation on MPII Human Pose

Pose Estimation Semantic Segmentation

Paper
Code

Learning to Segment Moving Objects in Videos

no code implementations • CVPR 2015 • Katerina Fragkiadaki, Pablo Arbelaez, Panna Felsen, Jitendra Malik

We segment moving objects in videos by ranking spatio-temporal segment proposals according to "moving objectness": how likely they are to contain a moving object.

Segmentation Video Segmentation +1

Paper
Add Code

Grouping-Based Low-Rank Trajectory Completion and 3D Reconstruction

no code implementations • NeurIPS 2014 • Katerina Fragkiadaki, Marta Salas, Pablo Arbelaez, Jitendra Malik

Furthermore, NRSfM needs to be robust to noise in both segmentation and tracking, e. g., drifting, segmentation ``leaking'', optical flow ``bleeding'' etc.

3D Reconstruction Clustering +5

Paper
Add Code

Pose from Flow and Flow from Pose

no code implementations • CVPR 2013 • Katerina Fragkiadaki, Han Hu, Jianbo Shi

The pose labeled segments and corresponding articulated joints are used to improve the motion flow fields by proposing kinematically constrained affine displacements on body parts.

Motion Estimation Segmentation

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.