TAO is a federated dataset for Tracking Any Object, containing 2,907 high resolution videos, captured in diverse environments, which are half a minute long on average. A bottom-up approach was used for discovering a large vocabulary of 833 categories, an order of magnitude more than prior tracking benchmarks.
47 PAPERS • 1 BENCHMARK
Gibson is an opensource perceptual and physics simulator to explore active and real-world perception. The Gibson Environment is used for Real-World Perception Learning.
22 PAPERS • NO BENCHMARKS YET
A platform for research in embodied artificial intelligence (AI).
13 PAPERS • NO BENCHMARKS YET
QDax is a benchmark suite designed for for Deep Neuroevolution in Reinforcement Learning domains for robot control. The suite includes the definition of tasks, environments, behavioral descriptors, and fitness. It specify different benchmarks based on the complexity of both the task and the agent controlled by a deep neural network. The benchmark uses standard Quality-Diversity metrics, including coverage, QD-score, maximum fitness, and an archive profile metric to quantify the relation between coverage and fitness.
6 PAPERS • NO BENCHMARKS YET
A dataset with high resolution (4K) images and manually-annotated dense labels every 50 frames.
3 PAPERS • NO BENCHMARKS YET
BASEPROD provides comprehensive rover sensor data collected over a 1.7 km traverse, accompanied by high-resolution 2D and 3D drone maps of the terrain. The dataset also includes laser-induced breakdown spectroscopy (LIBS) measurements from key sampling sites along the rover's path, as well as weather station data to contextualize environmental conditions.
2 PAPERS • NO BENCHMARKS YET
A benchmark for detecting fallen people lying on the floor. It consists of 6982 images, with a total of 5023 falls and 2275 non falls corresponding to people in conventional situations (standing up, sitting, lying on the sofa or bed, walking, etc). Almost all the images have been captured in indoor environments with very different situations: variation of poses and sizes, occlusions, lighting changes, etc.
MIDGARD is an open-source simulator for autonomous robot navigation in outdoor unstructured environments. It is designed to enable the training of autonomous agents (e.g., unmanned ground vehicles) in photorealistic 3D environments, and support the generalization skills of learning-based agents thanks to the variability in training scenarios.
Sparrow-V0: A Reinforcement Learning Friendly Simulator for Mobile Robot
ISOD contains 2,000 manually labelled RGB-D images from 20 diverse sites, each featuring over 30 types of small objects randomly placed amidst the items already present in the scenes. These objects, typically ≤3cm in height, include LEGO blocks, rags, slippers, gloves, shoes, cables, crayons, chalk, glasses, smartphones (and their cases), fake banana peels, fake pet waste, and piles of toilet paper, among others. These items were chosen because they either threaten the safe operation of indoor mobile robots or create messes if run over.
1 PAPER • NO BENCHMARKS YET
MlGesture is a dataset for hand gesture recognition tasks, recorded in a car with 5 different sensor types at two different viewpoints. The dataset contains over 1300 hand gesture videos from 24 participants and features 9 different hand gesture symbols. One sensor cluster with five different cameras is mounted in front of the driver in the center of the dashboard. A second sensor cluster is mounted on the ceiling looking straight down.
In the last years, the research interest in visual navigation towards objects in indoor environments has grown significantly. This growth can be attributed to the recent availability of large navigation datasets in photo-realistic simulated environments, like Gibson and Matterport3D. However, the navigation tasks supported by these datasets are often restricted to the objects present in the environment at acquisition time. Also, they fail to account for the realistic scenario in which the target object is a user-specific instance that can be easily confused with similar objects and may be found in multiple locations within the environment. To address these limitations, we propose a new task denominated Personalized Instance-based Navigation (PIN), in which an embodied agent is tasked with locating and reaching a specific personal object by distinguishing it among multiple instances of the same category. The task is accompanied by PInNED, a dedicated new dataset composed of photo-realis
SEmantic Salient Instance Video (SESIV) dataset is obtained by augmenting the DAVIS-2017 benchmark dataset by assigning semantic ground-truth for salient instance labels. The SESIV dataset consists of 84 high-quality video sequences with pixel-wisely per-frame ground-truth labels.
A dataset collected in a set of experiments that involves human participants and a robot.
This dataset involves a 2D or 3D agent moving from a start to goal pose while interacting with nearby objects. These objects can influence position of the agent via attraction or repulsion forces as well as influence orientation via attraction to object's orientation. This dataset can be used to pre-train general policy behavior, which can be later fine-tuned quickly for a person's specific preferences. Example use-cases include: - self-driving cars maintaining distance from other cars - robot pick-and-place tasks with intermediate subtasks (ie: scanning factory items before dropping them off)
The Fields2Benhmark dataset is a collection of 350 agricultural fields in vector format manually selected to test agricultural coverage path planning algorithms.
0 PAPER • NO BENCHMARKS YET
Towards automated analysis of large environments, hyperspectral sensors must be adapted into a format where they can be operated from mobile robots. In this dataset, we highlight hyperspectral datacubes collected from the Hyper-Drive imaging system. Our system collects and registers datacubes spanning the visible to shortwave infrared (660-1700 nm) in 33 wavelength channels. The system also simultaneously captures the ambient solar spectrum reflected off a white reference tile. The dataset consists of 500 labeled datacubes from on-road and off-road terrain compliant with the ATLAS.
Overview The goal: using simulation data to train neural networks to estimate the pose of a rover's camera with respect to a known target object