Single image 3D shape interpretation and reconstruction are closely related to each other but have long been studied separately and often end up with priors that are highly biased by training classes.
Parkour is a grand challenge for legged locomotion that requires robots to overcome various obstacles rapidly in complex environments.
However, these detectors usually require training on large amounts of annotated data that is expensive and time-consuming to collect.
To bridge the gap, we present a new online continual object detection benchmark with an egocentric video dataset, Objects Around Krishna (OAK).
In this work, we present MolCLR: Molecular Contrastive Learning of Representations via Graph Neural Networks (GNNs), a self-supervised learning framework that leverages large unlabeled data (~10M unique molecules).
Visual data in autonomous driving perception, such as camera image and LiDAR point cloud, can be interpreted as a mixture of two aspects: semantic feature and geometric structure.
Developing agents that can perform complex control tasks from high dimensional observations such as pixels is challenging due to difficulties in learning dynamics efficiently.
Instead, we propose leveraging vast unlabeled datasets by self-supervised metric learning of 3D object trackers, with a focus on data association.
Additionally, 3D MOT datasets such as KITTI evaluate MOT methods in 2D space and standardized 3D MOT evaluation tools are missing for a fair comparison of 3D MOT methods.
We propose a simple, fast, and flexible framework to generate simultaneously semantic and instance masks for panoptic segmentation.
Through experiments on a robotic manipulation dataset and two driving datasets, we show that SPFNet is effective for the SPF task, our forecast-then-detect pipeline outperforms the detect-then-forecast approaches to which we compared, and that pose forecasting performance improves with the addition of unlabeled data.
The covariances help to learn the relationship between the borders, and the mixture components potentially learn different configurations of an occluded part.
Very deep convolutional neural networks (CNNs) have been firmly established as the primary methods for many computer vision tasks.
Additionally, 3D MOT datasets such as KITTI evaluate MOT methods in the 2D space and standardized 3D MOT evaluation tools are missing for a fair comparison of 3D MOT methods.
Ranked #3 on 3D Multi-Object Tracking on KITTI
Large-scale object detection datasets (e. g., MS-COCO) try to define the ground truth bounding boxes as clear as possible.
Ranked #21 on Object Detection on PASCAL VOC 2007