Speech Commands is an audio dataset of spoken words designed to help train and evaluate keyword spotting systems .
373 PAPERS • 4 BENCHMARKS
Abstract: Measurements of electric power consumption in one household with a one-minute sampling rate over a period of almost 4 years. Different electrical quantities and some sub-metering values are available.
30 PAPERS • 6 BENCHMARKS
The PhysioNet Challenge 2012 dataset is publicly available and contains the de-identified records of 8000 patients in Intensive Care Units (ICU). Each record consists of roughly 48 hours of multivariate time series data with up to 37 features recorded at various times from the patients during their stay such as respiratory rate, glucose etc.
22 PAPERS • 5 BENCHMARKS
Caenorhabditis elegans is a roundworm commonly used as a model organism in the study of genetics. The movement of these worms is known to be a useful indicator for understanding behavioural genetics. Brown {\em et al.}[1] describe a system for recording the motion of worms on an agar plate and measuring a range of human-defined features[2]. It has been shown that the space of shapes Caenorhabditis elegans adopts on an agar plate can be represented by combinations of six base shapes, or eigenworms. Once the worm outline is extracted, each frame of worm motion can be captured by six scalars representing the amplitudes along each dimension when the shape is projected onto the six eigenworms. Using data collected for the work described in[1], we address the problem of classifying individual worms as wild-type or mutant based on the time series. The data were extracted from the C. elegans behavioural database [3]. We have 259 cases, which we split 131 train and 128 test. We have truncated e
20 PAPERS • 1 BENCHMARK
The time series segmentation benchmark (TSSB) currently contains 75 annotated time series (TS) with 1-9 segments. Each TS is constructed from one of the UEA & UCR time series classification datasets. We group TS by label and concatenate them to create segments with distinctive temporal patterns and statistical properties. We annotate the offsets at which we concatenated the segments as change points (CPs). Addtionally, we apply resampling to control the dataset resolution and add approximate, hand-selected window sizes that are able to capture temporal patterns.
8 PAPERS • 1 BENCHMARK
Alibaba Cluster Trace captures detailed statistics for the co-located workloads of long-running and batch jobs over a course of 24 hours. The trace consists of three parts: (1) statistics of the studied homogeneous cluster of 1,313 machines, including each machine’s hardware configuration, and the runtime {CPU, Memory, Disk} resource usage for a duration of 12 hours (the 2nd half of the 24-hour period); (2) long-running job workloads, including a trace of all container deployment requests and actions, and a resource usage trace for 12 hours; (3) co-located batch job workloads, including a trace of all batch job requests and actions, and a trace of per-instance resource usage over 24 hours.
7 PAPERS • 1 BENCHMARK
a dataset of time-series anomaly detection
7 PAPERS • 2 BENCHMARKS
The eSports Sensors dataset contains sensor data collected from 10 players in 22 matches in League of Legends. The sensor data collected includes:
4 PAPERS • 2 BENCHMARKS
A new spatio-temporal benchmark dataset (Hurricane), is suited for forecasting during extreme events and anomalies. The dataset is provided through the Florida Department of Revenue which provides the monthly sales revenue (2003-2020) for the tourism industry for all 67 counties of Florida which are prone to annual hurricanes. Furthermore, we aligned and joined the raw time series with the history of hurricane categories (i.e., event intensities) based on time for each county. Note that the hurricane category indicates the maximum sustained wind speed which can result in catastrophic damages as this number goes up (Category 1-6).
3 PAPERS • 1 BENCHMARK
Solar Power Data for Integration Studies NREL's Solar Power Data for Integration Studies are synthetic solar photovoltaic (PV) power plant data points for the United States representing the year 2006.
3 PAPERS • 2 BENCHMARKS
State-level data for the US economy through the lens of consumer spending (Credit/Debit Spending) . The dataset is enriched with state-level Economic Dynamics and Policy Responses. Specifically, we further enriched the data with the state-level policies as an indication of extreme events (e.g., the state’s business closure order).
2 PAPERS • 1 BENCHMARK
The dataset contains the hotel demand and revenue of 8 major tourist destinations in the US (e.g., Los Angeles, Orlando ...). The dataset contains sales, daily occupancy, demand, and revenue of the upper-middle class hotels.
2 PAPERS • NO BENCHMARKS YET
This paper presents a benchmark data set for condition monitoring of rolling bearings in combination with an extensive description of the corresponding bearing damage, the data set generation by experiments and results of datadriven classifications used as a diagnostic method. The diagnostic method uses the motor current signal of an electromechanical drive system for bearing diagnostic. The advantage of this approach in general is that no additional sensors are required, as current measurements can be performed in existing frequency inverters. This will help to reduce the cost of future condition monitoring systems. A particular novelty of the present approach is the monitoring of damage in external bearings which are installed in the drive system but outside the electric motor. Nevertheless, the motor current signal is used as input for the detection of the damage. Moreover, a wide distribution of bearing damage is considered for the benchmark data set. The results of the classificat
This dataset contains vibration data recorded on a rotating drive train. This drive train consists of an electronically commutated DC motor and a shaft driven by it, which passes through a roller bearing. With the help of a 3D-printed holder, unbalances with different weights and different radii were attached to the shaft. Besides the strength of the unbalances, the rotation speed of the motor was also varied. This dataset can be used to develop and test algorithms for the automatic detection of unbalances on drive trains. Datasets for 4 differently sized unbalances and for the unbalance-free case were recorded. The vibration data was recorded at a sampling rate of 4096 values per second. Datasets for development (ID "D[0-4]") as well as for evaluation (ID "E[0-4]") are available for each unbalance strength. The rotation speed was varied between approx. 630 and 2330 RPM in the development datasets and between approx. 1060 and 1900 RPM in the evaluation datasets. For each measurement of
Overview The edeniss2020 dataset is a time series dataset. It consists of equidistant sensor readings stemming from 97 sensors in the EDEN ISS research greenhouse.
This dataset contains data of 125 1-hour simulations of ship motion during various sea states performing random maneuvers in 4 degrees of freedom (surge-sway-yaw-roll). The original ship is a patrol ship developed by Perez et al. 1. We have extended it with a set of two symmetrically placed rudder propellers. Additionally, we simulate wind forces according to Isherwood's wind model 2. Wind-induced waves are generated with the JONSWAP spectrum 3 and the corresponding wave forces are then computed using wave force response amplitude operators (ROA).
1 PAPER • NO BENCHMARKS YET
ESA Anomaly Dataset is the first large-scale, real-life satellite telemetry dataset with curated anomaly annotations originated from three ESA missions. We hope that this unique dataset will allow researchers and scientists from academia, research institutes, national and international space agencies, and industry to benchmark models and approaches on a common baseline as well as research and develop novel, computational-efficient approaches for anomaly detection in satellite telemetry data.
FedTADBench is a federated time series anomaly detection benchmark. It covers 5 time series anomaly detection algorithms, 4 federated learning frameworks, and 3 time series anomaly detection datasets.
HASCD (Human Activity Segmentation Challenge Dataset) contains 250 annotated multivariate time series capturing 10.7 h of real-world human motion smartphone sensor data from 15 bachelor computer science students. The recordings capture 6 distinct human motion sequences designed to represent pervasive behaviour in realistic indoor and outdoor settings. The data set serves as a benchmark for evaluating machine learning workflows.
MOSAD (Mobile Sensing Human Activity Data Set) is a multi-modal, annotated time series (TS) data set that contains 14 recordings of 9 triaxial smartphone sensor measurements (126 TS) from 6 human subjects performing (in part) 3 motion sequences in different locations. The aim of the data set is to facilitate the study of human behaviour and the design of TS data mining technology to separate individual activities using low-cost sensors in wearable devices.
The sports industry is witnessing an increasing trend of utilizing multiple synchronized sensors for player data collection, enabling personalized training systems with multi-perspective real-time feedback. Badminton could benefit from these various sensors, but there is a scarcity of comprehensive badminton action datasets for analysis and training feedback. Addressing this gap, this paper introduces a multi-sensor badminton dataset for forehand clear and backhand drive strokes, based on interviews with coaches for optimal usability. The dataset covers various skill levels, including beginners, intermediates, and experts, providing resources for understanding biomechanics across skill levels. It encompasses 7,763 badminton swing data from 25 players, featuring sensor data on eye tracking, body tracking, muscle signals, and foot pressure. The dataset also includes video recordings, detailed annotations on stroke type, skill level, sound, ball landing, and hitting location, as well as s
Click to add a brief description of the dataset (Markdown and LaTeX enabled).
A Sentinel-2 based time series multi country benchmark dataset, tailored for agricultural monitoring applications with Machine and Deep Learning. Sen4AgriNet dataset is annotated from farmer declarations collected via the Land Parcel Identification System (LPIS) for harmonizing country wide labels. Sen4AgriNet is the only multi-country, multi-year dataset that includes all spectral information. It is constructed to cover the period 2016-2020 for Catalonia and France, while it can be extended to include additional countries. Currently, it contains 42.5 million parcels, which makes it significantly larger than other available archives.
pm2.5 time series data