Data Augmentation

2517 papers with code • 2 benchmarks • 63 datasets

Data augmentation involves techniques used for increasing the amount of data, based on different modifications, to expand the amount of examples in the original dataset. Data augmentation not only helps to grow the dataset but it also increases the diversity of the dataset. When training machine learning models, data augmentation acts as a regularizer and helps to avoid overfitting.

Data augmentation techniques have been found useful in domains like NLP and computer vision. In computer vision, transformations like cropping, flipping, and rotation are used. In NLP, data augmentation techniques can include swapping, deletion, random insertion, among others.

Benchmarks

Add a Result

These leaderboards are used to track progress in Data Augmentation

Trend	Dataset	Best Model	Paper	Code	Compare
	ImageNet	DeiT-B (+MixPro)			See all
	CIFAR-10	Shake-Shake (26 2×96d) (Faster AA)			See all

Libraries

Use these libraries to find Data Augmentation models and implementations

Westlake-AI/openmixup

15 papers

570

rwightman/pytorch-image-models

7 papers

29,774

makcedward/nlpaug

7 papers

4,298

faceonlive/ai-research

7 papers

152

See all 7 libraries.

Datasets

Subtasks

Most implemented papers

Most implemented Social Latest No code

YOLOv4: Optimal Speed and Accuracy of Object Detection

AlexeyAB/darknet • • 23 Apr 2020

There are a huge number of features which are said to improve Convolutional Neural Network (CNN) accuracy.

227

Paper
Code

Improved Baselines with Momentum Contrastive Learning

facebookresearch/moco • • 9 Mar 2020

Contrastive unsupervised learning has recently shown encouraging progress, e. g., in Momentum Contrast (MoCo) and SimCLR.

Paper
Code

AutoAugment: Learning Augmentation Policies from Data

tensorflow/models • • 24 May 2018

In our implementation, we have designed a search space where a policy consists of many sub-policies, one of which is randomly chosen for each image in each mini-batch.

Paper
Code

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

mozilla/DeepSpeech • • 18 Apr 2019

On LibriSpeech, we achieve 6. 8% WER on test-other without the use of a language model, and 5. 8% WER with shallow fusion with a language model.

Paper
Code

3D U-Net: Learning Dense Volumetric Segmentation from Sparse Annotation

wolny/pytorch-3dunet • • 21 Jun 2016

This paper introduces a network for volumetric segmentation that learns from sparsely annotated volumetric images.

Paper
Code

Improved Regularization of Convolutional Neural Networks with Cutout

uoguelph-mlrg/Cutout • • 15 Aug 2017

Convolutional neural networks are capable of learning powerful representational spaces, which are necessary for tackling complex learning tasks.

Paper
Code

Supervised Contrastive Learning

google-research/google-research • • NeurIPS 2020

Contrastive learning applied to self-supervised representation learning has seen a resurgence in recent years, leading to state of the art performance in the unsupervised training of deep image models.

Paper
Code

SimCSE: Simple Contrastive Learning of Sentence Embeddings

princeton-nlp/SimCSE • • EMNLP 2021

This paper presents SimCSE, a simple contrastive learning framework that greatly advances state-of-the-art sentence embeddings.

Paper
Code

Unsupervised Data Augmentation for Consistency Training

google-research/uda • • NeurIPS 2020

In this work, we present a new perspective on how to effectively noise unlabeled examples and argue that the quality of noising, specifically those produced by advanced data augmentation methods, plays a crucial role in semi-supervised learning.

Paper
Code

EfficientNetV2: Smaller Models and Faster Training

google/automl • • 1 Apr 2021

By pretraining on the same ImageNet21k, our EfficientNetV2 achieves 87. 3% top-1 accuracy on ImageNet ILSVRC2012, outperforming the recent ViT by 2. 0% accuracy while training 5x-11x faster using the same computing resources.

Paper
Code

Data Augmentation

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Most implemented papers

Content

Benchmarks

Add a Result