Optical flow estimation is very challenging in situations with transparent or occluded objects.
Deep learning has led to remarkable strides in scene understanding with panoptic segmentation emerging as a key holistic scene interpretation task.
Concurrently, recent breakthroughs in visual representation learning have sparked a paradigm shift leading to the advent of large foundation models that can be trained with completely unlabeled images.
RaLF is composed of radar and LiDAR feature encoders, a place recognition head that generates global descriptors, and a metric localization head that predicts the 3-DoF transformation between the radar scan and the map.
Surround vision systems that are pretty common in new vehicles use the IPM principle to generate a BEV image and to show it on display to the driver.
To address these limitations, we introduce AmodalSynthDrive, a synthetic multi-task multi-modal amodal perception dataset.
no code implementations • 10 Aug 2023 • D. Adriana Gómez-Rosal, Max Bergau, Georg K. J. Fischer, Andreas Wachaja, Johannes Gräter, Matthias Odenweller, Uwe Piechottka, Fabian Hoeflinger, Nikhil Gosala, Niklas Wetzel, Daniel Büscher, Abhinav Valada, Wolfram Burgard
In today's chemical plants, human field operators perform frequent integrity checks to guarantee high safety standards, and thus are possibly the first to encounter dangerous operating conditions.
Safety and efficiency are paramount in healthcare facilities where the lives of patients are at stake.
We present HIMOS, a hierarchical reinforcement learning approach that learns to compose exploration, navigation, and manipulation skills.
We employ our method to learn challenging multi-object robot manipulation tasks from wrist camera observations and demonstrate superior utility for policy learning compared to other representation learning techniques.
We hypothesize that this drawback results from formulating self-supervised objectives that are limited to single frames or frame pairs.
Perception datasets for agriculture are limited both in quantity and diversity which hinders effective training of supervised learning approaches.
We present CARTO, a novel approach for reconstructing multiple articulated objects from a single stereo RGB observation.
Operating a robot in the open world requires a high level of robustness with respect to previously unseen environments.
In this work, we propose EvCenterNet, a novel uncertainty-aware 2D object detection framework using evidential learning to directly estimate both classification and regression uncertainties.
Self-supervised feature learning enables perception systems to benefit from the vast raw data recorded by vehicle fleets worldwide.
To overcome these challenges, we propose a novel bottom-up approach to lane graph estimation from aerial imagery that aggregates multiple overlapping graphs into a single consistent graph.
Implicit supervision trains the model by enforcing spatial consistency of the scene over time based on FV semantic sequences, while explicit supervision exploits BEV pseudolabels generated from FV semantic annotations and self-supervised depth estimates.
In this work, we present the first survey on fairness in robot learning from an interdisciplinary perspective spanning technical, ethical, and legal challenges.
Despite its importance in both industrial and service robotics, mobile manipulation remains a significant challenge as it requires a seamless integration of end-effector trajectory generation with navigation skills as well as reasoning over long-horizons.
Amodal panoptic segmentation aims to connect the perception of the world to its cognitive understanding.
In recent years, policy learning methods using either reinforcement or imitation have made significant progress.
We evaluate our approach using various sensor modalities and model configurations on the challenging nuScenes and KITTI datasets.
Object detection, for the most part, has been formulated in the euclidean space, where euclidean or spherical geodesic distances measure the similarity of an image region to an object class prototype.
To enable robots to reason with this capability, we formulate and propose a novel task that we name amodal panoptic segmentation.
Ranked #1 on Amodal Panoptic Segmentation on BDD100K val
The success of deep learning in recent years has lead to a rising demand for neural network architecture engineering.
As robotic systems become more and more capable of assisting humans in their everyday lives, we must consider the opportunities for these artificial agents to make their human collaborators feel unsafe or to treat them unfairly.
Object recognition for the most part has been approached as a one-hot problem that treats classes to be discrete and unrelated.
In this technical report, we describe our EfficientLPT architecture that won the panoptic tracking challenge in the 7th AI Driving Olympics at NeurIPS 2021.
Audio-visual navigation combines sight and hearing to navigate to a sound-emitting source in an unmapped environment.
A core challenge for an autonomous agent acting in the real world is to adapt its repertoire of skills to cope with its noisy perception and dynamics.
Unsupervised Domain Adaptation (UDA) techniques are thus essential to fill this domain gap and retain the performance of models on new sensor setups without the need for additional data labeling.
Panoptic scene understanding and tracking of dynamic agents are essential for robots and automated vehicles to navigate in urban environments.
Ranked #1 on Panoptic Segmentation on Panoptic nuScenes test
Bird's-Eye-View (BEV) maps have emerged as one of the most powerful representations for scene understanding due to their ability to provide rich spatial context while being easy to interpret and process.
Furthermore, we introduce the dices dataset, which consists of over 2000 grayscale images of falling dices from multiple perspectives, with 5\% of the images containing rare anomalies (e. g., drill holes, sawing, or scratches).
Loop closure detection is an essential component of Simultaneous Localization and Mapping (SLAM) systems, which reduces the drift accumulated over time.
In this work, we present the novel self-supervised MM-DistillNet framework consisting of multiple teachers that leverage diverse modalities including RGB, depth and thermal images, to simultaneously exploit complementary cues and distill knowledge into a single audio student network.
Panoptic segmentation of point clouds is a crucial task that enables autonomous vehicles to comprehend their vicinity using their highly accurate and reliable LiDAR sensors.
The exponentially increasing advances in robotics and machine learning are facilitating the transition of robots from being confined to controlled industrial spaces to performing novel everyday tasks in domestic and urban environments.
In this technical report, we present key details of our winning panoptic segmentation architecture EffPS_b1bs4_RVC.
Dynamic objects have a significant impact on the robot's perception of the environment which degrades the performance of essential tasks such as localization and mapping.
In this paper, we now take it a step further by introducing CMRNet++, which is a significantly more robust model that not only generalizes to new places effectively, but is also independent of the camera parameters.
In this paper, we introduce a novel perception task denoted as multi-object panoptic tracking (MOPT), which unifies the conventionally disjoint tasks of semantic segmentation, instance segmentation, and multi-object tracking.
Understanding the scene in which an autonomous robot operates is critical for its competent functioning.
Ranked #1 on Panoptic Segmentation on KITTI Panoptic Segmentation
In this work, we propose a novel terrain classification framework leveraging an unsupervised proprioceptive classifier that learns from vehicle-terrain interaction sounds to self-supervise an exteroceptive classifier for pixel-wise semantic segmentation of images.
This problem is extremely challenging as pre-existing maps cannot be leveraged for navigation due to structural changes that may have occurred.
Indoor localization is one of the crucial enablers for deployment of service robots.
Learned representations from the traffic light recognition stream are fused with the estimated trajectories from the motion prediction stream to learn the crossing decision.
To address this limitation, we propose a mutimodal semantic segmentation framework that dynamically adapts the fusion of modality-specific features while being sensitive to the object category, spatial location and scene context in a self-supervised manner.
Ranked #1 on Semantic Segmentation on Freiburg Forest
Semantic understanding and localization are fundamental enablers of robot autonomy that have for the most part been tackled as disjoint problems.
Terrain classification is a critical component of any autonomous mobile robot system operating in unknown real-world environments.
We evaluate our proposed VLocNet on indoor as well as outdoor datasets and show that even our single task model exceeds the performance of state-of-the-art deep architectures for global localization, while achieving competitive performance for visual odometry estimation.