The recent development of novel aerial vehicles capable of physically interacting with the environment leads to new applications such as contact-based inspection.
In order to operate in human environments, a robot's semantic perception has to overcome open-world challenges such as novel objects and domain gaps.
This work investigates the use of Neural implicit representations, specifically Neural Radiance Fields (NeRF), for geometrical queries and motion planning.
In order to successfully solve the navigation task from only images, algorithms must be able to model the scene and its dynamics using only this channel of information.
We demonstrate the effectiveness of our Deep Learned Constellation Descriptor (Descriptellation) on the Paris-Rue-Lille and IQmulus datasets.
In this work, we present a full object keypoint tracking toolkit, encompassing the entire process from data collection, labeling, model learning and evaluation.
We present a novel 3D mapping method leveraging the recent progress in neural implicit representation for 3D reconstruction.
The proposed framework infers task failures by evaluating the individual prediction, across multiple visual perception tasks for different regions in an image.
Our approach models the calibration process compactly using model-free deep reinforcement learning to derive a policy that guides the motions of a robotic arm holding the sensor to efficiently collect measurements that can be used for both camera intrinsic calibration and camera-IMU extrinsic calibration.
Introducing semantically meaningful objects to visual Simultaneous Localization And Mapping (SLAM) has the potential to improve both the accuracy and reliability of pose estimates, especially in challenging scenarios with significant view-point and appearance changes.
Camera anomalies like rain or dust can severelydegrade image quality and its related tasks, such as localizationand segmentation.
The ability to simultaneously track and reconstruct multiple objects moving in the scene is of the utmost importance for robotic tasks such as autonomous navigation and interaction.
State-of-the-art semantic or instance segmentation deep neural networks (DNNs) are usually trained on a closed set of semantic classes.
This paper introduces SD-6DoF-ICLK, a learning-based Inverse Compositional Lucas-Kanade (ICLK) pipeline that uses sparse depth information to optimize the relative pose that best aligns two images on SE(3).
Furthermore, 3D feature-based registration methods have never quite reached the robustness of 2D methods in visual SLAM.
The inability of state-of-the-art semantic segmentation methods to detect anomaly instances hinders them from being deployed in safety-critical and complex applications, such as autonomous driving.
Ranked #2 on Anomaly Detection on Road Anomaly (using extra training data)
As a basis for a localization system we propose a complete on-board mapping pipeline able to map robust meaningful landmarks, such as poles from power lines, in the vicinity of the vehicle.
This, despite the fact that the physical 3D spaces have a similar semantic structure to bodies of text: words are surrounded by words that are semantically related, just like objects are surrounded by other objects that are similar in concept and usage.
However, the efficient and effective data collection for such a data-driven system on real robots is still an open challenge.
Robotics Systems and Control Systems and Control
General robot grasping in clutter requires the ability to synthesize grasps that work for previously unseen objects and that are also robust to physical interactions, such as collisions with other objects in the scene.
In this work, we design ways in which unsupervised learning can be used to assist reinforcement learning for robot navigation.
We find that this leads to improved OOD detection of epistemic uncertainty at the cost of ambiguous calibration close to the data distribution.
Transferring the style from one image onto another is a popular and widely studied task in computer vision.
Visual-inertial systems rely on precise calibrations of both camera intrinsics and inter-sensor extrinsics, which typically require manually performing complex motions in front of a calibration target.
A mechanism to detect OOD samples is important for safety-critical applications, such as automotive perception, to trigger a safe fallback mode.
In this paper, we introduce a novel learning-based approach to place recognition, using RGB-D cameras and line clusters as visual and geometric features.
Localization of a robotic system within a previously mapped environment is important for reducing estimation drift and for reusing previously built maps.
We present a Witness Autoencoder (W-AE) – an autoencoder that captures geodesic distances of the data in the latent space.
The method's front-end extracts event clusters that belong to line segments in the environment whereas the back-end estimates the system's trajectory alongside the lines' 3D position by minimizing point-to-line distances between individual events and the lines' projection in the image space.
In this paper, we present our deep learning-based human detection system that uses optical (RGB) and long-wave infrared (LWIR) cameras to detect, track, localize, and re-identify humans from UAVs flying at high altitude.
Self-diagnosis and self-repair are some of the key challenges in deploying robotic platforms for long-term real-world applications.
With humankind facing new and increasingly large-scale challenges in the medical and domestic spheres, automation of the service sector carries a tremendous potential for improved efficiency, quality, and safety of operations.
A fully actuated and omnidirectional tilt-rotor aerial system is used to show capabilities of the control and planning methods.
Velocity estimation plays a central role in driverless vehicles, but standard and affordable methods struggle to cope with extreme scenarios like aggressive maneuvers due to the presence of high sideslip.
Mobile manipulation is usually achieved by sequentially executing base and manipulator movements.
We propose PALNet, a novel hybrid network for SSC based on single depth.
Robust and accurate pose estimation is crucial for many applications in mobile robotics.
no code implementations • 4 Dec 2019 • Abel Gawel, Hermann Blum, Johannes Pankert, Koen Krämer, Luca Bartolomei, Selen Ercan, Farbod Farshidian, Margarita Chli, Fabio Gramazio, Roland Siegwart, Marco Hutter, Timothy Sandy
We present a fully-integrated sensing and control system which enables mobile manipulator robots to execute building tasks with millimeter-scale accuracy on building construction sites.
We therefore present SegMap: a map representation solution for localization and mapping based on the extraction of segments in 3D point clouds.
However, they are prone to local minima, resulting in sub-optimal trajectories, and sometimes do not reach global coverage.
There has been a remarkable progress in the accuracy of semantic segmentation due to the capabilities of deep learning.
Vision-based precision landing is enabled by estimating the landing pad's pose using a bundle of AprilTag fiducials configured for detection from a wide range of altitudes.
Foreground objects are therefore detected as areas in an image for which the descriptors are unlikely given the background distribution.
In this paper, we present a path planner for low-altitude terrain coverage in known environments with unmanned rotary-wing micro aerial vehicles (MAVs).
Our approach spans from offline model building to real-time client-side pose fusion.
This target selection relies on a shared idleness representation and a coordination mechanism preventing topological conflicts.
4 code implementations • 13 May 2019 • Juraj Kabzan, Miguel de la Iglesia Valls, Victor Reijgwart, Hubertus Franciscus Cornelis Hendrikx, Claas Ehmke, Manish Prajapat, Andreas Bühler, Nikhil Gosala, Mehak Gupta, Ramya Sivanesan, Ankit Dhall, Eugenio Chisari, Napat Karnchanachari, Sonja Brits, Manuel Dangel, Inkyu Sa, Renaud Dubé, Abel Gawel, Mark Pfeiffer, Alexander Liniger, John Lygeros, Roland Siegwart
This paper presents the algorithms and system architecture of an autonomous racecar.
Deep learning has enabled impressive progress in the accuracy of semantic segmentation.
Ranked #6 on Anomaly Detection on Fishyscapes L&F (using extra training data)
To autonomously navigate and plan interactions in real-world environments, robots require the ability to robustly perceive and map complex, unstructured surrounding scenes.
Changes in appearance is one of the main sources of failure in visual localization systems in outdoor environments.
In this paper we propose HF-Net, a hierarchical localization approach based on a monolithic CNN that simultaneously predicts local features and global descriptors for accurate 6-DoF localization.
Ranked #3 on Visual Place Recognition on Berlin Kudamm
The combination of aerial survey capabilities of Unmanned Aerial Vehicles with targeted intervention abilities of agricultural Unmanned Ground Vehicles can significantly improve the effectiveness of robotic systems applied to precision agriculture.
no code implementations • 26 Sep 2018 • Nikhil Bharadwaj Gosala, Andreas Bühler, Manish Prajapat, Claas Ehmke, Mehak Gupta, Ramya Sivanesan, Abel Gawel, Mark Pfeiffer, Mathias Bürki, Inkyu Sa, Renaud Dubé, Roland Siegwart
In autonomous racing, vehicles operate close to the limits of handling and a sensor failure can have critical consequences.
In this paper we present an end-to-end deep learning framework to turn images that show dynamic content, such as vehicles or pedestrians, into realistic static frames.
Unmanned Aerial Vehicles (UAVs) represent a new frontier in a wide range of monitoring and research applications.
Many robotics applications require precise pose estimates despite operating in large and changing environments.
Sensor fusion is a fundamental process in robotic systems as it extends the perceptual range and increases robustness in real-world operations.
We propose LandmarkBoost - an approach that, in contrast to the conventional 2D-3D matching methods, casts the search problem as a landmark classification task.
Yet, in highly dynamic environments, like crowded city streets, problems arise as major parts of the image can be covered by dynamic objects.
This paper discusses a large-scale and long-term mapping and localization scenario using the maplab open-source framework.
While current methods extract descriptors for the single task of localization, SegMap leverages a data-driven descriptor in order to extract meaningful features that can also be used for reconstructing a dense 3D map of the environment and for extracting semantic information.
We learn closed-loop policies mapping depth camera inputs to motion commands and compare different approaches to keep the problem tractable, including reward shaping, curriculum learning and using a policy pre-trained on a task with a reduced action set to warm-start the full problem.
This paper proposes a computationally efficient method to estimate the time-varying relative pose between two visual-inertial sensor rigs mounted on the flexible wings of a fixed-wing unmanned aerial vehicle (UAV).
Overall, our initial research demonstrates the feasibility of 3D wind field prediction from a UAV and the advantages of wind-aware planning.
On the other hand, maplab provides the research community with a collection of multisession mapping tools that include map merging, visual-inertial batch optimization, and loop closure.
While this makes them promising candidates for large-scale aerial inspection missions, their structural fragility necessitates that adverse weather is avoided using appropriate path planning methods.
Central to our approach is the representation of the environment as a collection of overlapping TSDF subvolumes.
We perform extensive simulations to show that this system performs better than the standard approach of using an optimistic global planner, and also outperforms doing a single exploration step when the local planner is stuck.
Our findings show that X-View is able to globally localize aerial-to-ground, and ground-to-ground robot data of drastically different view-points.
This paper reports on a data-driven, interaction-aware motion prediction approach for pedestrians in environments cluttered with static obstacles.
Then, we create a set of convex free-space clusters, which are the vertices of the topological map.
In this paper, we present an approach for dense semantic weed classification with multispectral images collected by a micro aerial vehicle (MAV).
We show that we can build TSDFs faster than Octomaps, and that it is more accurate to build ESDFs out of TSDFs than occupancy maps.
We propose a novel scoring concept for visual place recognition based on nearest neighbor descriptor voting and demonstrate how the algorithm naturally emerges from the problem formulation.
We propose SegMatch, a reliable loop-closure detection algorithm based on the matching of 3D segments.
A novel method for visual place recognition is introduced and evaluated, demonstrating robustness to perceptual aliasing and observation noise.