Search Results for author: Andrew Melnik

Found 31 papers, 12 papers with code

STEVE-Audio: Expanding the Goal Conditioning Modalities of Embodied Agents in Minecraft

no code implementations1 Dec 2024 Nicholas Lenzen, Amogh Raut, Andrew Melnik

Recently, the STEVE-1 approach has been introduced as a method for training generative agents to follow instructions in the form of latent CLIP embeddings.

Decision Making Minecraft +1

SplatR : Experience Goal Visual Rearrangement with 3D Gaussian Splatting and Dense Feature Matching

1 code implementation21 Nov 2024 Arjun P S, Andrew Melnik, Gora Chand Nandi

In this work, we present a novel framework that leverages on 3D Gaussian Splatting as a 3D scene representation for experience goal visual rearrangement task.

Object and Contact Point Tracking in Demonstrations Using 3D Gaussian Splatting

no code implementations5 Nov 2024 Michael Büttner, Jonathan Francis, Helge Rhodin, Andrew Melnik

This paper introduces a method to enhance Interactive Imitation Learning (IIL) by extracting touch interaction points and tracking object movement from video demonstrations.

Imitation Learning Point Tracking

Video Diffusion Models: A Survey

1 code implementation6 May 2024 Andrew Melnik, Michal Ljubljanac, Cong Lu, Qi Yan, Weiming Ren, Helge Ritter

Diffusion generative models have recently become a powerful technique for creating and modifying high-quality, coherent video content.

Survey Text-to-Video Generation +1

Lane Segmentation Refinement with Diffusion Models

no code implementations1 May 2024 Antonio Ruiz, Andrew Melnik, Dong Wang, Helge Ritter

The lane graph is a key component for building high-definition (HD) maps and crucial for downstream tasks such as autonomous driving or navigation planning.

Autonomous Driving Segmentation

Cognitive Planning for Object Goal Navigation using Generative AI Models

no code implementations30 Mar 2024 Arjun P S, Andrew Melnik, Gora Chand Nandi

Our approach enables a robot to navigate unfamiliar environments by leveraging LLMs and LVLMs to understand the semantic structure of the scene.

Efficient Exploration In-Context Learning +2

Benchmarks for Physical Reasoning AI

1 code implementation17 Dec 2023 Andrew Melnik, Robin Schiewer, Moritz Lange, Andrei Muresanu, Mozhgan Saeidi, Animesh Garg, Helge Ritter

Therefore, we aim to offer an overview of existing benchmarks and their solution approaches and propose a unified perspective for measuring the physical reasoning capacity of AI systems.

UniTeam: Open Vocabulary Mobile Manipulation Challenge

no code implementations14 Dec 2023 Andrew Melnik, Michael Büttner, Leon Harz, Lyon Brown, Gora Chand Nandi, Arjun PS, Gaurav Kumar Yadav, Rahul Kala, Robert Haschke

This report introduces our UniTeam agent - an improved baseline for the "HomeRobot: Open Vocabulary Mobile Manipulation" challenge.

Object

Language-Conditioned Semantic Search-Based Policy for Robotic Manipulation Tasks

no code implementations10 Dec 2023 Jannik Sheikh, Andrew Melnik, Gora Chand Nandi, Robert Haschke

Reinforcement learning and Imitation Learning approaches utilize policy learning strategies that are difficult to generalize well with just a few examples of a task.

Imitation Learning reinforcement-learning +1

Behavioral Cloning via Search in Embedded Demonstration Dataset

no code implementations15 Jun 2023 Federico Malato, Florian Leopold, Ville Hautamaki, Andrew Melnik

Actions from a selected similar situation can be performed by the agent until representations of the agent's current situation and the selected experience diverge in the latent space.

Behavioural cloning Minecraft

Contrastive Language, Action, and State Pre-training for Robot Learning

no code implementations21 Apr 2023 Krishan Rana, Andrew Melnik, Niko Sünderhauf

In this paper, we introduce a method for unifying language, action, and state information in a shared embedding space to facilitate a range of downstream tasks in robot learning.

Retrieval

Shape complexity estimation using VAE

1 code implementation5 Apr 2023 Markus Rothgaenger, Andrew Melnik, Helge Ritter

In this paper, we compare methods for estimating the complexity of two-dimensional shapes and introduce a method that exploits reconstruction loss of Variational Autoencoders with different sizes of latent vectors.

Attribute

Stroke-based Rendering: From Heuristics to Deep Learning

1 code implementation30 Dec 2022 Florian Nolte, Andrew Melnik, Helge Ritter

In the last few years, artistic image-making with deep learning models has gained a considerable amount of traction.

Deep Learning Neural Rendering +1

Behavioral Cloning via Search in Video PreTraining Latent Space

no code implementations27 Dec 2022 Federico Malato, Florian Leopold, Amogh Raut, Ville Hautamäki, Andrew Melnik

Our approach can effectively recover meaningful demonstration trajectories and show human-like behavior of an agent in the Minecraft environment.

Imitation Learning Minecraft

Face Generation and Editing with StyleGAN: A Survey

no code implementations18 Dec 2022 Andrew Melnik, Maksim Miasayedzenkau, Dzianis Makarovets, Dzianis Pirshtuk, Eren Akbulut, Dennis Holzmann, Tarek Renusch, Gustav Reichert, Helge Ritter

Our goal with this survey is to provide an overview of the state of the art deep learning methods for face generation and editing using StyleGAN.

Deep Learning Face Generation +3

Planning with RL and episodic-memory behavioral priors

no code implementations5 Jul 2022 Shivansh Beohar, Andrew Melnik

The practical application of learning agents requires sample efficient and interpretable algorithms.

Imitation Learning Q-Learning +2

Solving Learn-to-Race Autonomous Racing Challenge by Planning in Latent Space

no code implementations4 Jul 2022 Shivansh Beohar, Fabian Heinrich, Rahul Kala, Helge Ritter, Andrew Melnik

The agent is required to pass the previously unknown F1-style track in the minimum time with the least amount of off-road driving violations.

Autonomous Racing Road Segmentation

Faces: AI Blitz XIII Solutions

2 code implementations3 Apr 2022 Andrew Melnik, Eren Akbulut, Jannik Sheikh, Kira Loos, Michael Buettner, Tobias Lenze

AI Blitz XIII Faces challenge hosted on www. aicrowd. com platform consisted of five problems: Sentiment Classification, Age Prediction, Mask Prediction, Face Recognition, and Face De-Blurring.

Face Recognition Prediction +2

A Graph-based U-Net Model for Predicting Traffic in unseen Cities

1 code implementation11 Feb 2022 Luca Hermes, Barbara Hammer, Andrew Melnik, Riza Velioglu, Markus Vieth, Malte Schilling

Accurate traffic prediction is a key ingredient to enable traffic management like rerouting cars to reduce road congestion or regulating traffic via dynamic speed limits to maintain a steady flow.

Management Traffic Prediction

YOLO -- You only look 10647 times

no code implementations16 Jan 2022 Christian Limberg, Andrew Melnik, Augustin Harter, Helge Ritter

With this work we are explaining the "You Only Look Once" (YOLO) single-stage object detection approach as a parallel classification of 10647 fixed region proposals.

Classification image-classification +5

Critic Guided Segmentation of Rewarding Objects in First-Person Views

1 code implementation20 Jul 2021 Andrew Melnik, Augustin Harter, Christian Limberg, Krishan Rana, Niko Suenderhauf, Helge Ritter

This work discusses a learning approach to mask rewarding objects in images using sparse reward signals from an imitation learning dataset.

Imitation Learning

Solving Physics Puzzles by Reasoning about Paths

2 code implementations14 Nov 2020 Augustin Harter, Andrew Melnik, Gaurav Kumar, Dhruv Agarwal, Animesh Garg, Helge Ritter

We propose a new deep learning model for goal-driven tasks that require intuitive physical reasoning and intervention in the scene to achieve a desired end goal.

Object

Cannot find the paper you are looking for? You can Submit a new open access paper.