Search Results for author: Nikolaos Gkanatsios

Found 11 papers, 6 papers with code

ODIN: A Single Model for 2D and 3D Perception

no code implementations4 Jan 2024 Ayush Jain, Pushkal Katara, Nikolaos Gkanatsios, Adam W. Harley, Gabriel Sarch, Kriti Aggarwal, Vishrav Chaudhary, Katerina Fragkiadaki

The gap in performance between methods that consume posed images versus post-processed 3D point clouds has fueled the belief that 2D and 3D perception require distinct model architectures.

3D Instance Segmentation Semantic Segmentation

Act3D: 3D Feature Field Transformers for Multi-Task Robotic Manipulation

2 code implementations30 Jun 2023 Theophile Gervet, Zhou Xian, Nikolaos Gkanatsios, Katerina Fragkiadaki

3D perceptual representations are well suited for robot manipulation as they easily encode occlusions and simplify spatial reasoning.

Action Detection Pose Prediction +1

Energy-based Models are Zero-Shot Planners for Compositional Scene Rearrangement

no code implementations27 Apr 2023 Nikolaos Gkanatsios, Ayush Jain, Zhou Xian, Yunchu Zhang, Christopher Atkeson, Katerina Fragkiadaki

Language is compositional; an instruction can express multiple relation constraints to hold among objects in a scene that a robot is tasked to rearrange.

Language Modelling Large Language Model

Analogy-Forming Transformers for Few-Shot 3D Parsing

no code implementations27 Apr 2023 Nikolaos Gkanatsios, Mayank Singh, Zhaoyuan Fang, Shubham Tulsiani, Katerina Fragkiadaki

We present Analogical Networks, a model that encodes domain knowledge explicitly, in a collection of structured labelled 3D scenes, in addition to implicitly, as model parameters, and segments 3D object scenes with analogical reasoning: instead of mapping a scene to part segments directly, our model first retrieves related scenes from memory and their corresponding part structures, and then predicts analogous part structures for the input scene, via an end-to-end learnable modulation mechanism.

Few-Shot Learning

Bottom Up Top Down Detection Transformers for Language Grounding in Images and Point Clouds

1 code implementation16 Dec 2021 Ayush Jain, Nikolaos Gkanatsios, Ishita Mediratta, Katerina Fragkiadaki

We propose a language grounding model that attends on the referential utterance and on the object proposal pool computed from a pre-trained detector to decode referenced objects with a detection head, without selecting them from the pool.

Object object-detection +2

Language Modulated Detection and Detection Modulated Language Grounding in 2D and 3D Scenes

no code implementations29 Sep 2021 Ayush Jain, Nikolaos Gkanatsios, Ishita Mediratta, Katerina Fragkiadaki

Object detectors are typically trained on a fixed vocabulary of objects and attributes that is often too restrictive for open-domain language grounding, where the language utterance may refer to visual entities in various levels of abstraction, such as a cat, the leg of a cat, or the stain on the front leg of the chair.

Object object-detection +1

Grounding Consistency: Distilling Spatial Common Sense for Precise Visual Relationship Detection

1 code implementation ICCV 2021 Markos Diomataris, Nikolaos Gkanatsios, Vassilis Pitsikalis, Petros Maragos

Scene Graph Generators (SGGs) are models that, given an image, build a directed graph where each edge represents a predicted subject predicate object triplet.

Common Sense Reasoning Graph Generation +3

Orientation Attentive Robotic Grasp Synthesis with Augmented Grasp Map Representation

1 code implementation9 Jun 2020 Georgia Chalvatzaki, Nikolaos Gkanatsios, Petros Maragos, Jan Peters

Inherent morphological characteristics in objects may offer a wide range of plausible grasping orientations that obfuscates the visual learning of robotic grasping.

Grasp Generation Robotic Grasping

Deeply Supervised Multimodal Attentional Translation Embeddings for Visual Relationship Detection

1 code implementation15 Feb 2019 Nikolaos Gkanatsios, Vassilis Pitsikalis, Petros Koutras, Athanasia Zlatintsi, Petros Maragos

Detecting visual relationships, i. e. <Subject, Predicate, Object> triplets, is a challenging Scene Understanding task approached in the past via linguistic priors or spatial information in a single feature branch.

Relationship Detection Translation +1

Cannot find the paper you are looking for? You can Submit a new open access paper.