Search Results for author: Nikolaos Gkanatsios

Found 11 papers, 6 papers with code

3D Diffuser Actor: Policy Diffusion with 3D Scene Representations

1 code implementation • 16 Feb 2024 • Tsung-Wei Ke, Nikolaos Gkanatsios, Katerina Fragkiadaki

We marry diffusion policies and 3D scene representations for robot manipulation.

107

Paper
Code

Diffusion-ES: Gradient-free Planning with Diffusion for Autonomous Driving and Zero-Shot Instruction Following

no code implementations • 9 Feb 2024 • Brian Yang, Huangyuan Su, Nikolaos Gkanatsios, Tsung-Wei Ke, Ayush Jain, Jeff Schneider, Katerina Fragkiadaki

Diffusion-ES samples trajectories during evolutionary search from a diffusion model and scores them using a black-box reward function.

Autonomous Driving Denoising +2

Paper
Add Code

ODIN: A Single Model for 2D and 3D Perception

no code implementations • 4 Jan 2024 • Ayush Jain, Pushkal Katara, Nikolaos Gkanatsios, Adam W. Harley, Gabriel Sarch, Kriti Aggarwal, Vishrav Chaudhary, Katerina Fragkiadaki

The gap in performance between methods that consume posed images versus post-processed 3D point clouds has fueled the belief that 2D and 3D perception require distinct model architectures.

3D Instance Segmentation Semantic Segmentation

Paper
Add Code

Act3D: 3D Feature Field Transformers for Multi-Task Robotic Manipulation

2 code implementations • 30 Jun 2023 • Theophile Gervet, Zhou Xian, Nikolaos Gkanatsios, Katerina Fragkiadaki

3D perceptual representations are well suited for robot manipulation as they easily encode occlusions and simplify spatial reasoning.

Ranked #2 on Robot Manipulation on RLBench

Action Detection Pose Prediction +1

107

Paper
Code

Energy-based Models are Zero-Shot Planners for Compositional Scene Rearrangement

no code implementations • 27 Apr 2023 • Nikolaos Gkanatsios, Ayush Jain, Zhou Xian, Yunchu Zhang, Christopher Atkeson, Katerina Fragkiadaki

Language is compositional; an instruction can express multiple relation constraints to hold among objects in a scene that a robot is tasked to rearrange.

Language Modelling Large Language Model

Paper
Add Code

Analogy-Forming Transformers for Few-Shot 3D Parsing

no code implementations • 27 Apr 2023 • Nikolaos Gkanatsios, Mayank Singh, Zhaoyuan Fang, Shubham Tulsiani, Katerina Fragkiadaki

We present Analogical Networks, a model that encodes domain knowledge explicitly, in a collection of structured labelled 3D scenes, in addition to implicitly, as model parameters, and segments 3D object scenes with analogical reasoning: instead of mapping a scene to part segments directly, our model first retrieves related scenes from memory and their corresponding part structures, and then predicts analogous part structures for the input scene, via an end-to-end learnable modulation mechanism.

Few-Shot Learning

Paper
Add Code

Bottom Up Top Down Detection Transformers for Language Grounding in Images and Point Clouds

1 code implementation • 16 Dec 2021 • Ayush Jain, Nikolaos Gkanatsios, Ishita Mediratta, Katerina Fragkiadaki

We propose a language grounding model that attends on the referential utterance and on the object proposal pool computed from a pre-trained detector to decode referenced objects with a detection head, without selecting them from the pool.

Object object-detection +2

Paper
Code

Language Modulated Detection and Detection Modulated Language Grounding in 2D and 3D Scenes

no code implementations • 29 Sep 2021 • Ayush Jain, Nikolaos Gkanatsios, Ishita Mediratta, Katerina Fragkiadaki

Object detectors are typically trained on a fixed vocabulary of objects and attributes that is often too restrictive for open-domain language grounding, where the language utterance may refer to visual entities in various levels of abstraction, such as a cat, the leg of a cat, or the stain on the front leg of the chair.

Object object-detection +1

Paper
Add Code

Grounding Consistency: Distilling Spatial Common Sense for Precise Visual Relationship Detection

1 code implementation • ICCV 2021 • Markos Diomataris, Nikolaos Gkanatsios, Vassilis Pitsikalis, Petros Maragos

Scene Graph Generators (SGGs) are models that, given an image, build a directed graph where each edge represents a predicted subject predicate object triplet.

Common Sense Reasoning Graph Generation +3

Paper
Code

Orientation Attentive Robotic Grasp Synthesis with Augmented Grasp Map Representation

1 code implementation • 9 Jun 2020 • Georgia Chalvatzaki, Nikolaos Gkanatsios, Petros Maragos, Jan Peters

Inherent morphological characteristics in objects may offer a wide range of plausible grasping orientations that obfuscates the visual learning of robotic grasping.

Grasp Generation Robotic Grasping

Paper
Code

Deeply Supervised Multimodal Attentional Translation Embeddings for Visual Relationship Detection

1 code implementation • 15 Feb 2019 • Nikolaos Gkanatsios, Vassilis Pitsikalis, Petros Koutras, Athanasia Zlatintsi, Petros Maragos

Detecting visual relationships, i. e. <Subject, Predicate, Object> triplets, is a challenging Scene Understanding task approached in the past via linguistic priors or spatial information in a single feature branch.

Relationship Detection Translation +1

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.