Search Results for author: Katerina Fragkiadaki

Found 50 papers, 20 papers with code

Tractable Joint Prediction and Planning over Discrete Behavior Modes for Urban Driving

no code implementations12 Mar 2024 Adam Villaflor, Brian Yang, Huangyuan Su, Katerina Fragkiadaki, John Dolan, Jeff Schneider

Although these models have conventionally been evaluated for open-loop prediction, we show that they can be used to parameterize autoregressive closed-loop models without retraining.

Autonomous Driving Trajectory Forecasting

ODIN: A Single Model for 2D and 3D Perception

no code implementations4 Jan 2024 Ayush Jain, Pushkal Katara, Nikolaos Gkanatsios, Adam W. Harley, Gabriel Sarch, Kriti Aggarwal, Vishrav Chaudhary, Katerina Fragkiadaki

The gap in performance between methods that consume posed images versus post-processed 3D point clouds has fueled the belief that 2D and 3D perception require distinct model architectures.

3D Instance Segmentation Semantic Segmentation

Diffusion-TTA: Test-time Adaptation of Discriminative Models via Generative Feedback

1 code implementation27 Nov 2023 Mihir Prabhudesai, Tsung-Wei Ke, Alexander C. Li, Deepak Pathak, Katerina Fragkiadaki

Our method, Diffusion-TTA, adapts pre-trained discriminative models such as image classifiers, segmenters and depth predictors, to each unlabelled example in the test set using generative feedback from a diffusion model.

Test-time Adaptation

Gen2Sim: Scaling up Robot Learning in Simulation with Generative Models

no code implementations27 Oct 2023 Pushkal Katara, Zhou Xian, Katerina Fragkiadaki

We propose Generation to Simulation (Gen2Sim), a method for scaling up robot skill learning in simulation by automating generation of 3D assets, task descriptions, task decompositions and reward functions using large pre-trained generative models of language and vision.


Open-Ended Instructable Embodied Agents with Memory-Augmented Large Language Models

no code implementations23 Oct 2023 Gabriel Sarch, Yue Wu, Michael J. Tarr, Katerina Fragkiadaki

Pre-trained and frozen large language models (LLMs) can effectively map simple scene rearrangement instructions to programs over a robot's visuomotor functions through appropriate few-shot example prompting.

Prompt Engineering Retrieval

Zero-Shot Open-Vocabulary Tracking with Large Pre-Trained Models

no code implementations10 Oct 2023 Wen-Hsuan Chu, Adam W. Harley, Pavel Tokmakov, Achal Dave, Leonidas Guibas, Katerina Fragkiadaki

This begs the question: can we re-purpose these large-scale pre-trained static image models for open-vocabulary video tracking?

Object Object Tracking +5

Aligning Text-to-Image Diffusion Models with Reward Backpropagation

1 code implementation5 Oct 2023 Mihir Prabhudesai, Anirudh Goyal, Deepak Pathak, Katerina Fragkiadaki

Due to their unsupervised training, controlling their behavior in downstream tasks, such as maximizing human-perceived image quality, image-text alignment, or ethical image generation, is difficult.

Denoising Image Generation

Act3D: 3D Feature Field Transformers for Multi-Task Robotic Manipulation

2 code implementations30 Jun 2023 Theophile Gervet, Zhou Xian, Nikolaos Gkanatsios, Katerina Fragkiadaki

3D perceptual representations are well suited for robot manipulation as they easily encode occlusions and simplify spatial reasoning.

Action Detection Pose Prediction +1

Energy-based Models are Zero-Shot Planners for Compositional Scene Rearrangement

no code implementations27 Apr 2023 Nikolaos Gkanatsios, Ayush Jain, Zhou Xian, Yunchu Zhang, Christopher Atkeson, Katerina Fragkiadaki

Language is compositional; an instruction can express multiple relation constraints to hold among objects in a scene that a robot is tasked to rearrange.

Language Modelling Large Language Model

Analogy-Forming Transformers for Few-Shot 3D Parsing

no code implementations27 Apr 2023 Nikolaos Gkanatsios, Mayank Singh, Zhaoyuan Fang, Shubham Tulsiani, Katerina Fragkiadaki

We present Analogical Networks, a model that encodes domain knowledge explicitly, in a collection of structured labelled 3D scenes, in addition to implicitly, as model parameters, and segments 3D object scenes with analogical reasoning: instead of mapping a scene to part segments directly, our model first retrieves related scenes from memory and their corresponding part structures, and then predicts analogous part structures for the input scene, via an end-to-end learnable modulation mechanism.

Few-Shot Learning

FluidLab: A Differentiable Environment for Benchmarking Complex Fluid Manipulation

1 code implementation4 Mar 2023 Zhou Xian, Bo Zhu, Zhenjia Xu, Hsiao-Yu Tung, Antonio Torralba, Katerina Fragkiadaki, Chuang Gan

We identify several challenges for fluid manipulation learning by evaluating a set of reinforcement learning and trajectory optimization methods on our platform.


Planning with Spatial-Temporal Abstraction from Point Clouds for Deformable Object Manipulation

no code implementations27 Oct 2022 Xingyu Lin, Carl Qi, Yunchu Zhang, Zhiao Huang, Katerina Fragkiadaki, Yunzhu Li, Chuang Gan, David Held

Effective planning of long-horizon deformable object manipulation requires suitable abstractions at both the spatial and temporal levels.

Deformable Object Manipulation

TIDEE: Tidying Up Novel Rooms using Visuo-Semantic Commonsense Priors

1 code implementation21 Jul 2022 Gabriel Sarch, Zhaoyuan Fang, Adam W. Harley, Paul Schydlo, Michael J. Tarr, Saurabh Gupta, Katerina Fragkiadaki

We introduce TIDEE, an embodied agent that tidies up a disordered scene based on learned commonsense object placement and room arrangement priors.


Simple-BEV: What Really Matters for Multi-Sensor BEV Perception?

1 code implementation16 Jun 2022 Adam W. Harley, Zhaoyuan Fang, Jie Li, Rares Ambrus, Katerina Fragkiadaki

Building 3D perception systems for autonomous vehicles that do not rely on high-density LiDAR is a critical research problem because of the expense of LiDAR systems compared to cameras and other sensors.

Autonomous Vehicles Bird's-Eye View Semantic Segmentation +1

Particle Video Revisited: Tracking Through Occlusions Using Point Trajectories

1 code implementation8 Apr 2022 Adam W. Harley, Zhaoyuan Fang, Katerina Fragkiadaki

In this paper, we revisit Sand and Teller's "particle video" approach, and study pixel tracking as a long-range motion estimation problem, where every pixel is described with a trajectory that locates it in multiple future frames.

Motion Estimation Object Tracking +1

Test-time Adaptation with Slot-Centric Models

1 code implementation21 Mar 2022 Mihir Prabhudesai, Anirudh Goyal, Sujoy Paul, Sjoerd van Steenkiste, Mehdi S. M. Sajjadi, Gaurav Aggarwal, Thomas Kipf, Deepak Pathak, Katerina Fragkiadaki

In our work, we find evidence that these losses are insufficient for the task of scene decomposition, without also considering architectural inductive biases.

Image Classification Image Segmentation +7

Bottom Up Top Down Detection Transformers for Language Grounding in Images and Point Clouds

1 code implementation16 Dec 2021 Ayush Jain, Nikolaos Gkanatsios, Ishita Mediratta, Katerina Fragkiadaki

We propose a language grounding model that attends on the referential utterance and on the object proposal pool computed from a pre-trained detector to decode referenced objects with a detection head, without selecting them from the pool.

Object object-detection +2

Language Modulated Detection and Detection Modulated Language Grounding in 2D and 3D Scenes

no code implementations29 Sep 2021 Ayush Jain, Nikolaos Gkanatsios, Ishita Mediratta, Katerina Fragkiadaki

Object detectors are typically trained on a fixed vocabulary of objects and attributes that is often too restrictive for open-domain language grounding, where the language utterance may refer to visual entities in various levels of abstraction, such as a cat, the leg of a cat, or the stain on the front leg of the chair.

Object object-detection +1

CoCoNets: Continuous Contrastive 3D Scene Representations

1 code implementation CVPR 2021 Shamit Lal, Mihir Prabhudesai, Ishita Mediratta, Adam W. Harley, Katerina Fragkiadaki

This paper explores self-supervised learning of amodal 3D feature representations from RGB and RGB-D posed images and videos, agnostic to object and scene semantic content, and evaluates the resulting scene representations in the downstream tasks of visual correspondence, object tracking, and object detection.

3D Object Detection Contrastive Learning +4

HyperDynamics: Meta-Learning Object and Agent Dynamics with Hypernetworks

no code implementations17 Mar 2021 Zhou Xian, Shamit Lal, Hsiao-Yu Tung, Emmanouil Antonios Platanios, Katerina Fragkiadaki

We propose HyperDynamics, a dynamics meta-learning framework that conditions on an agent's interactions with the environment and optionally its visual observations, and generates the parameters of neural dynamics models based on inferred properties of the dynamical system.

Attribute Meta-Learning

HyperDynamics: Generating Expert Dynamics Models by Observation

no code implementations ICLR 2021 Zhou Xian, Shamit Lal, Hsiao-Yu Tung, Emmanouil Antonios Platanios, Katerina Fragkiadaki

We propose HyperDynamics, a framework that conditions on an agent’s interactions with the environment and optionally its visual observations, and generates the parameters of neural dynamics models based on inferred properties of the dynamical system.


Move to See Better: Self-Improving Embodied Object Detection

1 code implementation30 Nov 2020 Zhaoyuan Fang, Ayush Jain, Gabriel Sarch, Adam W. Harley, Katerina Fragkiadaki

Experiments on both indoor and outdoor datasets show that (1) our method obtains high-quality 2D and 3D pseudo-labels from multi-view RGB-D data; (2) fine-tuning with these pseudo-labels improves the 2D detector significantly in the test environment; (3) training a 3D detector with our pseudo-labels outperforms a prior self-supervised method by a large margin; (4) given weak supervision, our method can generate better pseudo-labels for novel objects.

Object object-detection +1

3D-OES: Viewpoint-Invariant Object-Factorized Environment Simulators

no code implementations12 Nov 2020 Hsiao-Yu Fish Tung, Zhou Xian, Mihir Prabhudesai, Shamit Lal, Katerina Fragkiadaki

Object motion predictions are computed by a graph neural network that operates over the object features extracted from the 3D neural scene representation.


Disentangling 3D Prototypical Networks For Few-Shot Concept Learning

1 code implementation ICLR 2021 Mihir Prabhudesai, Shamit Lal, Darshan Patil, Hsiao-Yu Tung, Adam W Harley, Katerina Fragkiadaki

We present neural architectures that disentangle RGB-D images into objects' shapes and styles and a map of the background scene, and explore their applications for few-shot 3D object detection and few-shot concept classification.

3D Object Detection Object +3

3D Object Recognition By Corresponding and Quantizing Neural 3D Scene Representations

no code implementations30 Oct 2020 Mihir Prabhudesai, Shamit Lal, Hsiao-Yu Fish Tung, Adam W. Harley, Shubhankar Potdar, Katerina Fragkiadaki

We can compare the 3D feature maps of two objects by searching alignment across scales and 3D rotations, and, as a result of the operation, we can estimate pose and scale changes without the need for 3D pose annotations.

3D Object Recognition Object +2

Tracking Emerges by Looking Around Static Scenes, with Neural 3D Mapping

no code implementations ECCV 2020 Adam W. Harley, Shrinidhi K. Lakshmikanth, Paul Schydlo, Katerina Fragkiadaki

We propose to leverage multiview data of \textit{static points} in arbitrary scenes (static or dynamic), to learn a neural 3D mapping module which produces features that are correspondable across time.

3D Object Tracking Object +1

Epipolar Transformers

1 code implementation CVPR 2020 Yihui He, Rui Yan, Katerina Fragkiadaki, Shoou-I Yu

The intuition is: given a 2D location p in the current view, we would like to first find its corresponding point p' in a neighboring view, and then combine the features at p' with the features at p, thus leading to a 3D-aware feature at p. Inspired by stereo matching, the epipolar transformer leverages epipolar constraints and feature matching to approximate the features at p'.

2D Pose Estimation 3D Hand Pose Estimation +3

Image Disentanglement and Uncooperative Re-Entanglement for High-Fidelity Image-to-Image Translation

no code implementations11 Jan 2019 Adam W. Harley, Shih-En Wei, Jason Saragih, Katerina Fragkiadaki

Cross-domain image-to-image translation should satisfy two requirements: (1) preserve the information that is common to both domains, and (2) generate convincing images covering variations that appear in the target domain.

Disentanglement Image-to-Image Translation +1

Reinforcement Learning of Active Vision for Manipulating Objects under Occlusions

1 code implementation20 Nov 2018 Ricson Cheng, Arpit Agarwal, Katerina Fragkiadaki

We propose hand/eye con-trollers that learn to move the camera to keep the object within the field of viewand visible, in coordination to manipulating it to achieve the desired goal, e. g., pushing it to a target location.

Object reinforcement-learning +1

Model Learning for Look-ahead Exploration in Continuous Control

1 code implementation20 Nov 2018 Arpit Agarwal, Katharina Muelling, Katerina Fragkiadaki

We propose an exploration method that incorporates look-ahead search over basic learnt skills and their dynamics, and use it for reinforcement learning (RL) of manipulation policies .

Continuous Control Reinforcement Learning (RL)

Geometry-Aware Recurrent Neural Networks for Active Visual Recognition

no code implementations NeurIPS 2018 Ricson Cheng, Ziyan Wang, Katerina Fragkiadaki

We present recurrent geometry-aware neural networks that integrate visual information across multiple views of a scene into 3D latent feature tensors, while maintaining an one-to-one mapping between 3D physical locations in the world scene and latent feature locations.

3D Reconstruction Object +3

Reward Learning from Narrated Demonstrations

no code implementations CVPR 2018 Hsiao-Yu Fish Tung, Adam W. Harley, Liang-Kang Huang, Katerina Fragkiadaki

Humans effortlessly "program" one another by communicating goals and desires in natural language.

Depth-Adaptive Computational Policies for Efficient Visual Tracking

no code implementations1 Jan 2018 Chris Ying, Katerina Fragkiadaki

Current convolutional neural networks algorithms for video object tracking spend the same amount of computation for each object and video frame.

Object Video Object Tracking +1

Adversarial Inverse Graphics Networks: Learning 2D-to-3D Lifting and Image-to-Image Translation from Unpaired Supervision

no code implementations ICCV 2017 Hsiao-Yu Fish Tung, Adam W. Harley, William Seto, Katerina Fragkiadaki

Researchers have developed excellent feed-forward models that learn to map images to desired outputs, such as to the images' latent factors, or to other images, using supervised learning.

3D Human Pose Estimation Image-to-Image Translation +2

Motion Prediction Under Multimodality with Conditional Stochastic Networks

no code implementations5 May 2017 Katerina Fragkiadaki, Jonathan Huang, Alex Alemi, Sudheendra Vijayanarasimhan, Susanna Ricco, Rahul Sukthankar

In this work, we present stochastic neural network architectures that handle such multimodality through stochasticity: future trajectories of objects, body joints or frames are represented as deep, non-linear transformations of random (as opposed to deterministic) variables.

motion prediction Optical Flow Estimation +2

SfM-Net: Learning of Structure and Motion from Video

no code implementations25 Apr 2017 Sudheendra Vijayanarasimhan, Susanna Ricco, Cordelia Schmid, Rahul Sukthankar, Katerina Fragkiadaki

We propose SfM-Net, a geometry-aware neural network for motion estimation in videos that decomposes frame-to-frame pixel motion in terms of scene and object depth, camera motion and 3D object rotations and translations.

Motion Estimation Object +1

Learning Visual Predictive Models of Physics for Playing Billiards

no code implementations23 Nov 2015 Katerina Fragkiadaki, Pulkit Agrawal, Sergey Levine, Jitendra Malik

The ability to plan and execute goal specific actions in varied, unexpected settings is a central requirement of intelligent agents.

Recurrent Network Models for Human Dynamics

no code implementations ICCV 2015 Katerina Fragkiadaki, Sergey Levine, Panna Felsen, Jitendra Malik

We propose the Encoder-Recurrent-Decoder (ERD) model for recognition and prediction of human body pose in videos and motion capture.

Ranked #8 on Human Pose Forecasting on Human3.6M (MAR, walking, 1,000ms metric)

Human Dynamics Human Pose Forecasting +2

Human Pose Estimation with Iterative Error Feedback

1 code implementation CVPR 2016 Joao Carreira, Pulkit Agrawal, Katerina Fragkiadaki, Jitendra Malik

Hierarchical feature extractors such as Convolutional Networks (ConvNets) have achieved impressive performance on a variety of classification tasks using purely feedforward processing.

Pose Estimation Semantic Segmentation

Learning to Segment Moving Objects in Videos

no code implementations CVPR 2015 Katerina Fragkiadaki, Pablo Arbelaez, Panna Felsen, Jitendra Malik

We segment moving objects in videos by ranking spatio-temporal segment proposals according to "moving objectness": how likely they are to contain a moving object.

Segmentation Video Segmentation +1

Grouping-Based Low-Rank Trajectory Completion and 3D Reconstruction

no code implementations NeurIPS 2014 Katerina Fragkiadaki, Marta Salas, Pablo Arbelaez, Jitendra Malik

Furthermore, NRSfM needs to be robust to noise in both segmentation and tracking, e. g., drifting, segmentation ``leaking'', optical flow ``bleeding'' etc.

3D Reconstruction Clustering +5

Pose from Flow and Flow from Pose

no code implementations CVPR 2013 Katerina Fragkiadaki, Han Hu, Jianbo Shi

The pose labeled segments and corresponding articulated joints are used to improve the motion flow fields by proposing kinematically constrained affine displacements on body parts.

Motion Estimation Segmentation

Cannot find the paper you are looking for? You can Submit a new open access paper.