🔔 Share your dataset with the ML community!

Filter by Modality (clear)

Filter by Task

Filter by Language

25 dataset results for Interactive

The SUN RGBD dataset contains 10335 real RGB-D images of room scenes. Each RGB image has a corresponding depth and segmentation map. As many as 700 object categories are labeled. The training and testing sets contain 5285 and 5050 images, respectively.

421 PAPERS • 13 BENCHMARKS

ZINC

ZINC is a free database of commercially-available compounds for virtual screening. ZINC contains over 230 million purchasable compounds in ready-to-dock, 3D formats. ZINC also contains over 750 million purchasable compounds that can be searched for analogs.

198 PAPERS • 5 BENCHMARKS

R2R (Room-to-Room)

R2R is a dataset for visually-grounded natural language navigation in real buildings. The dataset requires autonomous agents to follow human-generated navigation instructions in previously unseen buildings, as illustrated in the demo above. For training, each instruction is associated with a Matterport3D Simulator trajectory. 22k instructions are available, with an average length of 29 words. There is a test evaluation server for this dataset available at EvalAI.

140 PAPERS • 2 BENCHMARKS

NCLT (North Campus Long-Term Vision and LiDAR)

The NCLT dataset is a large scale, long-term autonomy dataset for robotics research collected on the University of Michigan’s North Campus. The dataset consists of omnidirectional imagery, 3D lidar, planar lidar, GPS, and proprioceptive sensors for odometry collected using a Segway robot. The dataset was collected to facilitate research focusing on long-term autonomous operation in changing environments. The dataset is comprised of 27 sessions spaced approximately biweekly over the course of 15 months. The sessions repeatedly explore the campus, both indoors and outdoors, on varying trajectories, and at different times of the day across all four seasons. This allows the dataset to capture many challenging elements including: moving obstacles (e.g., pedestrians, bicyclists, and cars), changing lighting, varying viewpoint, seasonal and weather changes (e.g., falling leaves and snow), and long-term structural changes caused by construction projects.

96 PAPERS • NO BENCHMARKS YET

MAESTRO

The MAESTRO dataset contains over 200 hours of paired audio and MIDI recordings from ten years of International Piano-e-Competition. The MIDI data includes key strike velocities and sustain/sostenuto/una corda pedal positions. Audio and MIDI files are aligned with ∼3 ms accuracy and sliced to individual musical pieces, which are annotated with composer, title, and year of performance. Uncompressed audio is of CD quality or higher (44.1–48 kHz 16-bit PCM stereo).

89 PAPERS • NO BENCHMARKS YET

COMA

CoMA contains 17,794 meshes of the human face in various expressions

74 PAPERS • 1 BENCHMARK

Jericho

Jericho is a learning environment for man-made Interactive Fiction (IF) games.

46 PAPERS • NO BENCHMARKS YET

Kumar

The Kumar dataset contains 30 1,000×1,000 image tiles from seven organs (6 breast, 6 liver, 6 kidney, 6 prostate, 2 bladder, 2 colon and 2 stomach) of The Cancer Genome Atlas (TCGA) database acquired at 40× magnification. Within each image, the boundary of each nucleus is fully annotated.

16 PAPERS • 1 BENCHMARK

NYU Hand

The NYU Hand pose dataset contains 8252 test-set and 72757 training-set frames of captured RGBD data with ground-truth hand-pose information. For each frame, the RGBD data from 3 Kinects is provided: a frontal view and 2 side views. The training set contains samples from a single user only (Jonathan Tompson), while the test set contains samples from two users (Murphy Stein and Jonathan Tompson). A synthetic re-creation (rendering) of the hand pose is also provided for each view.

16 PAPERS • 1 BENCHMARK

Watch-n-Patch

The Watch-n-Patch dataset was created with the focus on modeling human activities, comprising multiple actions in a completely unsupervised setting. It is collected with Microsoft Kinect One sensor for a total length of about 230 minutes, divided in 458 videos. 7 subjects perform human daily activities in 8 offices and 5 kitchens with complex backgrounds. Moreover, skeleton data are provided as ground truth annotations.

12 PAPERS • NO BENCHMARKS YET

Synbols

Synbols is a dataset generator designed for probing the behavior of learning algorithms. By defining the distribution over latent factors one can craft a dataset specifically tailored to answer specific questions about a given algorithm.

11 PAPERS • NO BENCHMARKS YET

DCASE 2018 Task 4

DCASE2018 Task 4 is a dataset for large-scale weakly labeled semi-supervised sound event detection in domestic environments. The data are YouTube video excerpts focusing on domestic context which could be used for example in ambient assisted living applications. The domain was chosen due to the scientific challenges (wide variety of sounds, time-localized events...) and potential industrial applications. Specifically, the task employs a subset of “Audioset: An Ontology And Human-Labeled Dataset For Audio Events” by Google. Audioset consists of an expanding ontology of 632 sound event classes and a collection of 2 million human-labeled 10-second sound clips (less than 21% are shorter than 10-seconds) drawn from 2 million Youtube videos. The ontology is specified as a hierarchical graph of event categories, covering a wide range of human and animal sounds, musical instruments and genres, and common everyday environmental sounds. Task 4 focuses on a subset of Audioset that consists of 10

10 PAPERS • NO BENCHMARKS YET

ViP-Bench (Making Large Multimodal Models Understand Arbitrary Visual Prompts)

ViP-Bench is a comprehensive benchmark designed to assess the capability of multimodal models in understanding visual prompts across multiple dimensions. It aims to evaluate how well these models interpret various visual prompts, including recognition, OCR, knowledge, math, relationship reasoning, and language generation. ViP-Bench includes a diverse set of 303 images and questions, providing a thorough assessment of visual understanding capabilities at the region level. This benchmark sets a foundation for future research into multimodal models with arbitrary visual prompts.

10 PAPERS • 1 BENCHMARK

Avalon

Avalon is a benchmark for generalization in Reinforcement Learning (RL). The benchmark consists of a set of tasks in which embodied agents in highly diverse procedural 3D worlds must survive by navigating terrain, hunting or gathering food, and avoiding hazards. Avalon is unique among existing RL benchmarks in that the reward function, world dynamics, and action space are the same for every task, with tasks differentiated solely by altering the environment; its 20 tasks, ranging in complexity from eat and throw to hunt and navigate, each create worlds in which the agent must perform specific skills in order to survive. This benchmark setup enables investigations of generalization within tasks, between tasks, and to compositional tasks that require combining skills learned from previous tasks.

3 PAPERS • NO BENCHMARKS YET

map2seq

7,672 human written natural language navigation instructions for routes in OpenStreetMap with a focus on visual landmarks. Validated in Street View.

3 PAPERS • 2 BENCHMARKS

SuperCaustics

SuperCaustics is a simulation tool made in Unreal Engine for generating massive computer vision datasets that include transparent objects.

2 PAPERS • 1 BENCHMARK

bipedal-skills (Bipedal Skills Benchmark for Reinforcement Learning)

The bipedal skills benchmark is a suite of reinforcement learning environments implemented for the MuJoCo physics simulator. It aims to provide a set of tasks that demand a variety of motor skills beyond locomotion, and is intended for evaluating skill discovery and hierarchical learning methods. The majority of tasks exhibit a sparse reward structure.

2 PAPERS • NO BENCHMARKS YET

Baxter-UR5_95-Objects

In this dataset two robots, Baxter and UR5, perform 8 behaviors (look, grasp, pick, hold, shake, lower, drop, and push) on 95 objects that vary by 5 color (blue, green, red, white, and yellow), 6 contents (wooden button, plastic dices, glass marbles, nuts & bolts, pasta, and rice), and 4 weights (empty, 50g, 100g, and 150g). There are 90 objects with contents (5 colors x 3 weights x 6 contents) and 5 objects without any content that only vary by 5 colors. Both robots perform 5 trials on each object, resulting in 7,600 interactions (2 robots x 8 behaviors x 95 objects x 5 trials

1 PAPER • NO BENCHMARKS YET

CY101 Dataset

In this dataset an uppertorso humanoid robot with 7-DOF arm explored 100 different objects belonging to 20 different categories using 10 behaviors: Look, Crush, Grasp, Hold, Lift, Drop, Poke, Push, Shake and Tap.

1 PAPER • NO BENCHMARKS YET

ChildCIdb (ChildCIdbv1)

A large-scale, first-of-its-kind database aimed at generating a better understanding of the way children interact with mobile devices during their development process. ChildCIdbv1 comprises data collected from 438 children, from 18 months to 8 years old, encompassing the first three development stages of Piaget's theory. Data collected spans interaction with screens using both finger and pen stylus, information regarding the previous experience of the child with mobile devices, the child’s grade level, and whether attention-deficit/hyperactivity disorder (ADHD) is present.

1 PAPER • NO BENCHMARKS YET

Daimler Monocular Pedestrian Detection

The Daimler Monocular Pedestrian Detection dataset is a dataset for pedestrian detection in urban environments. The training set contains 15560 pedestrian samples (image cut-outs at 48×96 resolution) and 6744 additional full images without pedestrians for extracting negative samples. The test set contains an independent sequence with more than 21790 images and 56492 pedestrian labels (fully visible or partially occluded), captured from a vehicle during a 27 min driving through the urban traffic.

1 PAPER • NO BENCHMARKS YET

MiniWob++

MiniWob++ is a suite of web-browser based tasks introduced in Liu et al. (2018) (an extension of the earlier MiniWob task suite (Shi et al., 2017)). Tasks range from simple button clicking to complex form-filling, for example, to book a flight when given particular instructions (Fig. 1a). Programmatic rewards are available for each task, permitting standard reinforcement learning techniques.

1 PAPER • NO BENCHMARKS YET

The Mafia Dataset

The Mafia Dataset was created to model the behavior of deceptive actors in the context of the Mafia game, as described in the paper “Putting the Con in Context: Identifying Deceptive Actors in the Game of Mafia”. We hope that this dataset will be of use to others studying the effects of deception on language use.

1 PAPER • NO BENCHMARKS YET

UR5 Tool Dataset

In this dataset UR5 robot used 6 tools: metal-scissor, metal-whisk, plastic-knife, plastic-spoon, wooden-chopstick, and wooden-fork to perform 6 behaviors: look, stirring-slow, stirring-fast, stirring-twist, whisk, and poke. The robot explored 15 objects: cane-sugar, chia-seed, chickpea, detergent, empty, glass-bead, kidney-bean, metal-nut-bolt, plastic-bead, salt, split-green-pea, styrofoam-bead, water, wheat, and wooden-button kept cylindrical containers. The robot performed 10 trials on each object using a tool, resulting in 5,400 interactions (6 tools x 6 behaviors x 15 objects x 10 trials). The robot records multiple sensory data (audio, RGB images, depth images, haptic, and touch images) while interacting with the objects.

1 PAPER • NO BENCHMARKS YET

VR Curve on Surface Drawing Dataset

The datasets includes curves drawn on 3D surfaces (triangle meshes) in Virtual Reality. A total of 2,880 curves were created using two different techniques by 20 users on 6 meshes. For each curve, a 3D curve executed by the user is provided, the projected curve created on the mesh, and the ground truth target curve on the mesh. For collecting the data, two different task types were employed, which are described in the paper.

1 PAPER • NO BENCHMARKS YET

Datasets

25 dataset results for Interactive