We study the problem of generalizable task learning from human demonstration videos without extra training on the robot or pre-recorded robot motions.
Further, we extend the decentralized approach to sequential decision-making problems where we show in 13 continuous control benchmark environments that it matches or outperforms the state-of-the-art CEM algorithms in most cases, under the same budget of the total number of samples for planning.
Semantic segmentation in autonomous driving predominantly focuses on learning from large-scale data with a closed set of known classes without considering unknown objects.
We formalize the task of video class agnostic segmentation from monocular video sequences in autonomous driving to account for unknown objects.
In this paper, we propose a simple yet powerful Boundary-Aware Segmentation Network (BASNet), which comprises a predict-refine architecture and a hybrid loss, for highly accurate image segmentation.
We consider real-world reinforcement learning (RL) of robotic manipulation tasks that involve both visuomotor skills and contact-rich skills.
In this paper, we design a simple yet powerful deep network architecture, U$^2$-Net, for salient object detection (SOD).
Ranked #1 on Salient Object Detection on SOD
We consider the problem of visual imitation learning without human supervision (e. g. kinesthetic teaching or teleoperation), nor access to an interactive reinforcement learning (RL) training environment.
In this paper, to model the intended concepts of manipulation, we present a vision dataset under a strictly constrained knowledge domain for both robot and human manipulations, where manipulation concepts and relations are stored by an ontology system in a taxonomic manner.
Our results show that few-shot segmentation benefits from utilizing word embeddings, and that we are able to perform few-shot segmentation using stacked joint visual semantic processing with weak image-level labels.
Conventional few-shot object segmentation methods learn object segmentation from a few labelled support images with strongly labelled segmentation masks.
We observe that our method demonstrates time-efficient path planning behavior with high success rate in mapless navigation tasks.
Using the framework, we present a case study where robot performs manipulation actions in a kitchen environment, bridging visual perception with contextual semantics using the generated dynamic knowledge graphs.
Our method is evaluated on PASCAL-$5^i$ dataset and outperforms the state-of-the-art in the few-shot semantic segmentation.
A human teacher can show potential objects of interest to the robot, which is able to self adapt to the teaching signal without providing manual segmentation labels.
Ranked #13 on Unsupervised Video Object Segmentation on DAVIS 2016
Our proposed method can directly learn from raw videos, which removes the need for hand-engineered task specification.
In this paper, we address this gap by presenting a real-time semantic segmentation benchmarking framework with a decoupled design for feature extraction and decoding methods.
We propose an end-to-end neural network that improves the segmentation accuracy of fully convolutional networks by incorporating a localization unit.
We propose an attention mechanism for 3D medical image segmentation.
Our experiments show that the proposed method outperforms state of the art methods that utilize motion cue only with 21. 5% in mAP on KITTI MOD.
This approach reduces a 3D line segment fitting problem into two 2D line segment fitting problems and takes advantage of both images and depth maps.
In this paper, the semantic segmentation problem is explored from the perspective of automated driving.
The tracking scheme is coherently integrated into a perceptual grouping framework in which the visual tracking problem is tackled by identifying a subset of these line segments and connecting them sequentially to form a closed boundary with the largest saliency and a certain similarity to the previous one.
One of these trackers is a newly developed learning based tracker that relies on learning discriminative correlation filters while the other is a refinement of a recent 8 DoF RANSAC based tracker adapted with a new appearance model for tracking 4 DoF motion.
Given this information, the robot visually explores the object and adds images from it to re-train the perception module.
This architecture is tested for both binary and semantic video segmentation tasks.
This paper adapts a popular image quality measure called structural similarity for high precision registration based tracking while also introducing a simpler and faster variant of the same.
Visual detection methods represent a cost-effective option, since they can take advantage of hardware usually already available in many parking lots, namely cameras.
Accordingly, we propose a novel method for online segmentation of video sequences that incorporates temporal data.
We show how existing trackers can be broken down using the suggested methodology and compare the performance of the default configuration chosen by the authors against other possible combinations to demonstrate the new insights that can be gained by such an approach.
This paper presents a modular, extensible and highly efficient open source framework for registration based tracking called Modular Tracking Framework (MTF).