Data Augmentation

1664 papers with code • 1 benchmarks • 62 datasets

Data augmentation involves techniques used for increasing the amount of data, based on different modifications, to expand the amount of examples in the original dataset. Data augmentation not only helps to grow the dataset but it also increases the diversity of the dataset. When training machine learning models, data augmentation acts as a regularizer and helps to avoid overfitting.

Data augmentation techniques have been found useful in domains like NLP and computer vision. In computer vision, transformations like cropping, flipping, and rotation are used. In NLP, data augmentation techniques can include swapping, deletion, random insertion, among others.

Further readings:

( Image credit: Albumentations )


Use these libraries to find Data Augmentation models and implementations

CL4CTR: A Contrastive Learning Framework for CTR Prediction

cl4ctr/cl4ctr 1 Dec 2022

Many Click-Through Rate (CTR) prediction works focused on designing advanced architectures to model complex feature interactions but neglected the importance of feature representation learning, e. g., adopting a plain embedding layer for each feature, which results in sub-optimal feature representations and thus inferior CTR prediction performance.

01 Dec 2022

Improving Commonsense in Vision-Language Models via Knowledge Graph Riddles

pleaseconnectwifi/dance 29 Nov 2022

Through our analysis, we find one important reason is that existing large-scale VL datasets do not contain much commonsense knowledge, which motivates us to improve the commonsense of VL-models from the data perspective.

29 Nov 2022

Mutual Exclusivity Training and Primitive Augmentation to Induce Compositionality

owenzx/met-primaug 28 Nov 2022

Recent datasets expose the lack of the systematic generalization ability in standard sequence-to-sequence models.

28 Nov 2022

Improving Low-Resource Question Answering using Active Learning in Multiple Stages

primeqa/primeqa 27 Nov 2022

Furthermore, they often yield very good performance but only in the domain they were trained on.

27 Nov 2022

Rethinking Data Augmentation for Single-source Domain Generalization in Medical Image Segmentation

kaiseem/slaug 27 Nov 2022

Single-source domain generalization (SDG) in medical image segmentation is a challenging yet essential task as domain shifts are quite common among clinical image datasets.

27 Nov 2022

DigGAN: Discriminator gradIent Gap Regularization for GAN Training with Limited Data

ailsaf/diggan 27 Nov 2022

In contrast, we propose a Discriminator gradIent Gap regularized GAN (DigGAN) formulation which can be added to any existing GAN.

27 Nov 2022

Towards Good Practices for Missing Modality Robust Action Recognition

sangminwoo/actionmae 25 Nov 2022

We ask: how can we train a model that is robust to missing modalities?

25 Nov 2022

Breaking the Representation Bottleneck of Chinese Characters: Neural Machine Translation with Stroke Sequence Modeling

zjwang21/strokenet 23 Nov 2022

Existing research generally treats Chinese character as a minimum unit for representation.

23 Nov 2022

Join the High Accuracy Club on ImageNet with A Binary Neural Network Ticket

hpi-xnor/bnext 23 Nov 2022

In this work, we revisit the potential of binary neural networks and focus on a compelling but unanswered problem: how can a binary neural network achieve the crucial accuracy level (e. g., 80%) on ILSVRC-2012 ImageNet?

23 Nov 2022

ModelDiff: A Framework for Comparing Learning Algorithms

madrylab/modeldiff 22 Nov 2022

We study the problem of (learning) algorithm comparison, where the goal is to find differences between models trained with two different learning algorithms.

22 Nov 2022