🔔 Share your dataset with the ML community!

Filter by Modality

Filter by Task (clear)

Filter by Language

30 dataset results for Continual Learning

CIFAR-10 (Canadian Institute for Advanced Research, 10 classes)

The CIFAR-10 dataset (Canadian Institute for Advanced Research, 10 classes) is a subset of the Tiny Images dataset and consists of 60000 32x32 color images. The images are labelled with one of 10 mutually exclusive classes: airplane, automobile (but not truck or pickup truck), bird, cat, deer, dog, frog, horse, ship, and truck (but not pickup truck). There are 6000 images per class with 5000 training and 1000 testing images per class.

14,087 PAPERS • 98 BENCHMARKS

CIFAR-100

The CIFAR-100 dataset (Canadian Institute for Advanced Research, 100 classes) is a subset of the Tiny Images dataset and consists of 60000 32x32 color images. The 100 classes in the CIFAR-100 are grouped into 20 superclasses. There are 600 images per class. Each image comes with a "fine" label (the class to which it belongs) and a "coarse" label (the superclass to which it belongs). There are 500 training images and 100 testing images per class.

7,653 PAPERS • 52 BENCHMARKS

MNIST

The MNIST database (Modified National Institute of Standards and Technology database) is a large collection of handwritten digits. It has a training set of 60,000 examples, and a test set of 10,000 examples. It is a subset of a larger NIST Special Database 3 (digits written by employees of the United States Census Bureau) and Special Database 1 (digits written by high school students) which contain monochrome images of handwritten digits. The digits have been size-normalized and centered in a fixed-size image. The original black and white (bilevel) images from NIST were size normalized to fit in a 20x20 pixel box while preserving their aspect ratio. The resulting images contain grey levels as a result of the anti-aliasing technique used by the normalization algorithm. the images were centered in a 28x28 image by computing the center of mass of the pixels, and translating the image so as to position this point at the center of the 28x28 field.

6,980 PAPERS • 52 BENCHMARKS

Oxford 102 Flower (102 Category Flower Dataset)

Oxford 102 Flower is an image classification dataset consisting of 102 flower categories. The flowers chosen to be flower commonly occurring in the United Kingdom. Each class consists of between 40 and 258 images.

1,044 PAPERS • 14 BENCHMARKS

Stanford Cars

The Stanford Cars dataset consists of 196 classes of cars with a total of 16,185 images, taken from the rear. The data is divided into almost a 50-50 train/test split with 8,144 training images and 8,041 testing images. Categories are typically at the level of Make, Model, Year. The images are 360×240.

645 PAPERS • 10 BENCHMARKS

Sketch

The Sketch dataset contains over 20,000 sketches evenly distributed over 250 object categories.

220 PAPERS • 1 BENCHMARK

Permuted MNIST

Permuted MNIST is an MNIST variant that consists of 70,000 images of handwritten digits from 0 to 9, where 60,000 images are used for training, and 10,000 images for test. The difference of this dataset from the original MNIST is that each of the ten tasks is the multi-class classification of a different random permutation of the input pixels.

111 PAPERS • 2 BENCHMARKS

CORe50

CORe50 is a dataset designed for assessing Continual Learning techniques in an Object Recognition context.

110 PAPERS • NO BENCHMARKS YET

WikiArt

WikiArt contains painting from 195 different artists. The dataset has 42129 images for training and 10628 images for testing.

77 PAPERS • 2 BENCHMARKS

ROAD

ROAD (ROAD: The ROad event Awareness Dataset for Autonomous Driving)

ROAD is designed to test an autonomous vehicle's ability to detect road events, defined as triplets composed by an active agent, the action(s) it performs and the corresponding scene locations. ROAD comprises videos originally from the Oxford RobotCar Dataset, annotated with bounding boxes showing the location in the image plane of each road event.

19 PAPERS • NO BENCHMARKS YET

TweetQA

With social media becoming increasingly popular on which lots of news and real-time events are reported, developing automated question answering systems is critical to the effectiveness of many applications that rely on real-time knowledge. While previous question answering (QA) datasets have concentrated on formal text like news and Wikipedia, the first large-scale dataset for QA over social media data is presented. To make sure the tweets are meaningful and contain interesting information, tweets used by journalists to write news articles are gathered. Then human annotators are asked to write questions and answers upon these tweets. Unlike other QA datasets like SQuAD in which the answers are extractive, the answer are allowed to be abstractive. The task requires model to read a short tweet and a question and outputs a text phrase (does not need to be in the tweet) as the answer.

18 PAPERS • 1 BENCHMARK

20Newsgroup (10 tasks)

This dataset has 20 classes and each class has about 1000 documents. The data split for train/validation/test is 1600/200/200. We created 10 tasks, 2 classes per task. Since this is topic-based text classification data, the classes are very different and have little shared knowledge. As mentioned above, this application (and dataset) is mainly used to show a CL model's ability to overcome forgetting. Detailed statistics please on page https://github.com/ZixuanKe/PyContinual

11 PAPERS • 1 BENCHMARK

ASC (TIL, 19 tasks)

ASC (TIL, 19 tasks) (Task Incremental Aspect Sentiment Classification)

A set of 19 ASC datasets (reviews of 19 products) producing a sequence of 19 tasks. Each dataset represents a task. The datasets are from 4 sources: (1) HL5Domains (Hu and Liu, 2004) with reviews of 5 products; (2) Liu3Domains (Liu et al., 2015) with reviews of 3 products; (3) Ding9Domains (Ding et al., 2008) with reviews of 9 products; and (4) SemEval14 with reviews of 2 products - SemEval 2014 Task 4 for laptop and restaurant. For (1), (2) and (3), we split about 10% of the original data as the validate data, another about 10% of the original data as the testing data. For (4), We use 150 examples from the training set for validation. To be consistent with existing research(Tang et al., 2016), examples belonging to the conflicting polarity (both positive and negative sentiments are expressed about an aspect term) are not used. Statistics and details of the 19 datasets are given on Page https://github.com/ZixuanKe/PyContinual.

11 PAPERS • 1 BENCHMARK

Continual World

Continual World is a benchmark consisting of realistic and meaningfully diverse robotic tasks built on top of Meta-World as a testbed.

11 PAPERS • NO BENCHMARKS YET

Visual Domain Decathlon

The goal of this challenge is to solve simultaneously ten image classification problems representative of very different visual domains. The data for each domain is obtained from the following image classification benchmarks:

8 PAPERS • 1 BENCHMARK

DSC (10 tasks)

DSC (10 tasks) (Task Incremental Document Sentiment Classification)

A set of 10 DSC datasets (reviews of 10 products) to produce sequences of tasks. The products are Sports, Toys, Tools, Video, Pet, Musical, Movies, Garden, Offices, and Kindle. 2500 positive and 2500 negative training reviews per task . The validation reviews are with 250 positive and 250 negative and the test reviews are with 250 positive and 250 negative reviews. The detailed statistic on page https://github.com/ZixuanKe/PyContinual

6 PAPERS • 1 BENCHMARK

F-CelebA (10 tasks)

F-CelebA (10 tasks) (Federated-CelebA (10 tasks))

F-CelebA - This dataset is adapted from federated learning. Federated learning is an emerging machine learning paradigm with an emphasis on data privacy. The idea is to train through model aggregation rather than conventional data aggregation and keep local data staying on the local device. This dataset naturally consists of similar tasks and each of the 10 tasks contains images of a celebrity labeled by whether he/she is smiling or not. More detailed please check page https://github.com/ZixuanKe/CAT

6 PAPERS • 1 BENCHMARK

LReID

LReID is a benchmark for lifelong person reidentification. It has been built using existing datasets, and it consists of two subsets: LReID-Seen and LReID-Unseen.

5 PAPERS • NO BENCHMARKS YET

Wild-Time

Wild-Time is a benchmark of 5 datasets that reflect temporal distribution shifts arising in a variety of real-world applications, including patient prognosis and news classification. On these datasets, we systematically benchmark 13 prior approaches, including methods in domain generalization, continual learning, self-supervised learning, and ensemble learning.

4 PAPERS • NO BENCHMARKS YET

HASY

HASY is a dataset of single symbols similar to MNIST. It contains 168,233 instances of 369 classes. HASY contains two challenges: A classification challenge with 10 pre-defined folds for 10-fold cross-validation and a verification challenge.

3 PAPERS • NO BENCHMARKS YET

MLT17

Click to add a brief description of the dataset (Markdown and LaTeX enabled).

2 PAPERS • NO BENCHMARKS YET

OpenLORIS-object

(L)ifel(O)ng (R)obotic V(IS)ion (OpenLORIS) - Object Recognition Dataset (OpenLORIS-Object) is designed for accelerating the lifelong/continual/incremental learning research and application，currently focusing on improving the continuous learning capability of the common objects in the home scenario.

2 PAPERS • NO BENCHMARKS YET

SKILL-102 (SKILL 102 Lifelong Learning Dataset)

SKILL-102 consists of 102 image classification datasets. Each one supports one complex classification task, and the corresponding dataset was obtained from previously published sources (e.g., task 1: classify flowers into 102 classes, such as lily, rose, petunia, etc using 8,185 train/val/test images (Nilsback & Zisserman, 2008a); task 2: classify 67 types of scenes, such as kitchen, bedroom, gas station, library, etc using 15,524 images (Quattoni & Torralba, 2009). In total, SKILL-102 comprises 102 tasks, 5,033 classes, and 2,041,225 training images. To the best of our knowledge, SKILL-102 is the most challenging completely real (not synthesized or permuted) image classification benchmark for LL and SKILL algorithms, with the largest number of tasks, number of classes, and inter-task variance.

2 PAPERS • NO BENCHMARKS YET

TemporalWiki

TemporalWiki is a lifelong benchmark for ever-evolving LMs that utilizes the difference between consecutive snapshots of English Wikipedia and English Wikidata for training and evaluation, respectively. The benchmark hence allows researchers to periodically track an LM's ability to retain previous knowledge and acquire updated/new knowledge at each point in time.

2 PAPERS • NO BENCHMARKS YET

UESTC-MMEA-CL (A multi-modal egocentric activity dataset for continual learning)

UESTC-MMEA-CL is a new multi-modal activity dataset for continual egocentric activity recognition, which is proposed to promote future studies on continual learning for first-person activity recognition in wearable applications. Our dataset provides not only vision data with auxiliary inertial sensor data but also comprehensive and complex daily activity categories for the purpose of continual learning research. UESTC-MMEA-CL comprises 30.4 hours of fully synchronized first-person video clips, acceleration stream and gyroscope data in total. There are 32 activity classes in the dataset and each class contains approximately 200 samples. We divide the samples of each class into the training set, validation set and test set according to the ratio of 7:2:1. For the continual learning evaluation, we present three settings of incremental steps, i.e., the 32 classes are divided into {16, 8, 4} incremental steps and each step contains {2, 4, 8} activity classes, respectively.

2 PAPERS • NO BENCHMARKS YET

BeGin

BeGin provides 23 benchmark scenarios for graph from 14 real-world datasets, which cover 12 combinations of the incremental settings and the levels of problem. In addition, BeGin provides various basic evaluation metrics for measuring the performances and final evalution metrics designed for continual learning.

1 PAPER • NO BENCHMARKS YET

CRL-Person

Provides two large-scale multi-step benchmarks for biometric identification, where the visual appearance of different classes are highly relevant.

1 PAPER • NO BENCHMARKS YET

ConCon Dataset

ConCon Dataset (Continually Confounded Dataset)

ConCon: Continually Confounded Dataset is a confounded visual dataset for continual learning.

1 PAPER • NO BENCHMARKS YET

HOWS (HOWS-CL-25)

HOWS-CL-25 (Household Objects Within Simulation dataset for Continual Learning) is a synthetic dataset especially designed for object classification on mobile robots operating in a changing environment (like a household), where it is important to learn new, never seen objects on the fly. This dataset can also be used for other learning use-cases, like instance segmentation or depth estimation. Or where household objects or continual learning are of interest.

1 PAPER • 2 BENCHMARKS

idsprites

idsprites (Infinite dSprites)

Easily generate simple continual learning benchmarks. Inspired by dSprites.

1 PAPER • NO BENCHMARKS YET

Datasets

30 dataset results for Continual Learning