DAHLIA dataset [1] is devoted to human activity recognition, which is a major issue for adapting smart-home services such as user assistance. DAHLIA has been realized in Mobile Mii Platform by CEA LIST, and has been partly supported by ITEA 3 Emospaces Project (https://itea3.org/project/emospaces.html)
0 PAPER • NO BENCHMARKS YET
The Freiburg Campus 3D Scan dataset consists of 3D area maps from the Freiburg campus that were scanned with 3D lasers. Areas include corridors, the outdoor campus, and some of the colleges and buildings.
Freiburg Lighting Adaptable Map Tracking is a dataset for camera trajectory estimation. The dataset consists of two subdatasets, each consisting of a Lighting Adaptable Map and three camera trajectories recorded under varying lighting conditions. The map meshes are stored in PLY format with custom properties and elements. The trajectories contain synchronized RGB-D images, exposure times and gains, ground-truth light settings and camera poses, as well as the camera tracking results presented in the paper.
The Freiburg RGB-D People dataset contains 3000+ RGB-D frames acquired in a university hall from three vertically mounted Kinect sensors. The data contains mostly upright walking and standing persons seen from different orientations and with different levels of occlusions.
The volumetric representation of human interactions is one of the fundamental domains in the development of immersive media productions and telecommunication applications. Particularly in the context of the rapid advancement of Extended Reality (XR) applications, this volumetric data has proven to be an essential technology for future XR elaboration. In this work, we present a new multimodal database to help advance the development of immersive technologies. Our proposed database provides ethically compliant and diverse volumetric data, in particular 27 participants displaying posed facial expressions and subtle body movements while speaking, plus 11 participants wearing head-mounted displays (HMDs). The recording system consists of a volumetric capture (VoCap) studio, including 31 synchronized modules with 62 RGB cameras and 31 depth cameras. In addition to textured meshes, point clouds, and multi-view RGB-D data, we use one Lytro Illum camera for providing light field (LF) data simul
Estimating 6D object poses is a major challenge in 3D computer vision. Building on successful instance-level approaches, research is shifting towards category-level pose estimation for practical applications. Current categorylevel datasets, however, fall short in annotation quality and pose variety. Addressing this, we introduce HouseCat6D, a new category-level 6D pose dataset. It features 1) multimodality with Polarimetric RGB and Depth (RGBD+P), 2) encompasses 194 diverse objects across 10 household categories, including two photometrically challenging ones, and 3) provides high-quality pose annotations with an error range of only 1.35 mm to 1.74 mm. The dataset also includes 4) 41 large-scale scenes with comprehensive viewpoint and occlusion coverage, 5) a checkerboard-free environment, and 6) dense 6D parallel-jaw robotic grasp annotations. Additionally, we present benchmark results for leading category-level pose estimation networks.
InfiniteRep is a synthetic, open-source dataset for fitness and physical therapy (PT) applications. It includes 1k videos of diverse avatars performing multiple repetitions of common exercises. It includes significant variation in the environment, lighting conditions, avatar demographics, and movement trajectories. From cadence to kinematic trajectory, each rep is done slightly differently -- just like real humans. InfiniteRep videos are accompanied by a rich set of pixel-perfect labels and annotations, including frame-specific repetition counts.
The dataset includes recordings from 10 different users teaching the robot different common kitchen objects, that consists of synchronized recordings from three cameras and a microphone mounted on the robot:
PAVIS RGB-D is a dataset for person re-identification using depth information. The main motivation is that techniques such as SDALF fail when the individuals change their clothing, therefore they cannot be used for long-term video surveillance. Depth information is the solution to deal with this problem because it stays constant for a longer period of time. The dataset is composed by four different groups of data collected using the Kinect. The first group of data has been obtained by recording 79 people with a frontal view, walking slowly, avoiding occlusions and with stretched arms ("Collaborative"). This happened in an indoor scenario, where the people were at least 2 meters away from the camera. The second ("Walking1") and third ("Walking2") groups of data are composed by frontal recordings of the same 79 people walking normally while entering the lab where they normally work. The fourth group ("Backwards") is a back view recording of the people walking away from the lab. The data
Overview The goal: using simulation data to train neural networks to estimate the pose of a rover's camera with respect to a known target object
The Robot-at-Home dataset (Robot@Home) is a collection of raw and processed data from five domestic settings compiled by a mobile robot equipped with 4 RGB-D cameras and a 2D laser scanner. Its main purpose is to serve as a testbed for semantic mapping algorithms through the categorization of objects and/or rooms.
Sugar Beets 2016 is a robot dataset for plant classification as well as localization and mapping that covers the relevant stages for robotic intervention and weed control. It contains around 5TB of data recorded from a robot with a 4-channel multi-spectral camera and a RGB-D sensor to capture detailed information about the plantation.
Toronto NeuroFace Dataset: A New Dataset for Facial Motion Analysis in Individuals with Neurological Disorders
The University of Padova Body Pose Estimation dataset (UNIPD-BPE) is an extensive dataset for multi-sensor body pose estimation containing both single-person and multi-person sequences with up to 4 interacting people A network with 5 Microsoft Azure Kinect RGB-D cameras is exploited to record synchronized high-definition RGB and depth data of the scene from multiple viewpoints, as well as to estimate the subjects’ poses using the Azure Kinect Body Tracking SDK. Simultaneously, full-body Xsens MVN Awinda inertial suits allow obtaining accurate poses and anatomical joint angles, while also providing raw data from the 17 IMUs required by each suit. All the cameras and inertial suits are hardware synchronized, while the relative poses of each camera with respect to the inertial reference frame are calibrated before each sequence to ensure maximum overlap of the two sensing systems outputs.
The RGB-D Scenes Dataset contains 8 scenes annotated with objects that belong to the Washington RGB-D Object Dataset. Each scene is a single video sequence consisting of multiple RGB-D frames.
The RGB-D Scenes Dataset v2 consists of 14 scenes containing furniture (chair, coffee table, sofa, table) and a subset of the objects in the RGB-D Object Dataset (bowls, caps, cereal boxes, coffee mugs, and soda cans). Each scene is a point cloud created by aligning a set of video frames using Patch Volumes Mapping.