Developing intelligent robots for complex manipulation tasks in household and factory settings remains challenging due to long-horizon tasks, contact-rich manipulation, and the need to generalize across a wide variety of object shapes and scene layouts.
Furthermore, the intrinsic heterogeneity in human behavior can produce equally successful but disparate demonstrations, further exacerbating the challenge of discerning demonstration quality.
We also show that the learned skills can be reused to accelerate learning in new tasks domains and transfer to a physical robot platform.
We present a framework that uses GAN-augmented images to complement certain specific attributes, usually underrepresented, for machine learning model training.
Improving the sample efficiency of reinforcement learning algorithms requires effective exploration.
In contrast, we propose adaptive thin volumes (ATVs); in an ATV, the depth hypothesis of each plane is spatially varying, which adapts to the uncertainties of previous per-pixel depth predictions.
Ranked #13 on 3D Reconstruction on DTU
On the other hand, in addition to the conventional discriminator of GAN (i. e., to distinguish between REAL/FAKE samples), we propose a novel guider sub-network which encourages the generated sample (i. e., with novel pose) towards better satisfying the ReID loss (i. e., cross-entropy ReID loss, triplet ReID loss).
Despite recent emergence of adversarial based methods for video prediction, existing algorithms often produce unsatisfied results in image regions with rich structural information (i. e., object boundary) and detailed motion (i. e., articulated body movement).
First, to facilitate this novel research of fine-grained video caption, we collected a novel dataset called Fine-grained Sports Narrative dataset (FSN) that contains 2K sports videos with ground-truth narratives from YouTube. com.