The UCY dataset consist of real pedestrian trajectories with rich multi-human interaction scenarios captured at 2.5 Hz (Δt=0.4s). It is composed of three sequences (Zara01, Zara02, and UCY), taken in public spaces from top-view.
126 PAPERS • 1 BENCHMARK
NTU RGB+D 2D is a curated version of NTU RGB+D often used for skeleton-based action prediction and synthesis. It contains less number of actions.
7 PAPERS • 1 BENCHMARK
The GTA Indoor Motion dataset (GTA-IM) that emphasizes human-scene interactions in the indoor environments. It consists of HD RGB-D image sequences of 3D human motion from a realistic game engine. The dataset has clean 3D human pose and camera pose annotations, and large diversity in human appearances, indoor environments, camera views, and human activities.
5 PAPERS • 1 BENCHMARK
The SIND dataset is based on 4K video captured by drones, providing information including traffic participant trajectories, traffic light status, and high-definition maps
4 PAPERS • NO BENCHMARKS YET
The Argoverse 2 Motion Forecasting Dataset is a curated collection of 250,000 scenarios for training and validation. Each scenario is 11 seconds long and contains the 2D, birds-eye-view centroid and heading of each tracked object sampled at 10 Hz.
3 PAPERS • 1 BENCHMARK
Covers 5 generic driving scenarios, with a total of 25 distinct action classes. It contains more than 15K full HD, 5s long videos acquired in various driving conditions, weathers, daytimes and environments, complemented with a common and realistic set of sensor measurements. This amounts to more than 2.25M frames, each annotated with an action label, corresponding to 600 samples per action class.
3 PAPERS • NO BENCHMARKS YET
Consists of two pedestrian trajectory datasets, CITR dataset and DUT dataset, so that the pedestrian motion models can be further calibrated and verified, especially when vehicle influence on pedestrians plays an important role.
2 PAPERS • NO BENCHMARKS YET
A self-driving dataset for motion prediction, containing over 1,000 hours of data. This was collected by a fleet of 20 autonomous vehicles along a fixed route in Palo Alto, California, over a four-month period. It consists of 170,000 scenes, where each scene is 25 seconds long and captures the perception output of the self-driving system, which encodes the precise positions and motions of nearby vehicles, cyclists, and pedestrians over time.
Occ-Traj120 is a trajectory dataset that contains occupancy representations of different local-maps with associated trajectories. This dataset contains 400 locally-structured maps with occupancy representation and roughly around 120K trajectories in total.
The Freiburg Street Crossing dataset consists of data collected from three different street crossings in Freiburg, Germany; ; two of which were traffic light regulated intersections and one a zebra crossing without traffic lights. The data can be used to train agents to cross roads autonomously.
1 PAPER • NO BENCHMARKS YET