Data Augmentation

2517 papers with code • 2 benchmarks • 63 datasets

Data augmentation involves techniques used for increasing the amount of data, based on different modifications, to expand the amount of examples in the original dataset. Data augmentation not only helps to grow the dataset but it also increases the diversity of the dataset. When training machine learning models, data augmentation acts as a regularizer and helps to avoid overfitting.

Data augmentation techniques have been found useful in domains like NLP and computer vision. In computer vision, transformations like cropping, flipping, and rotation are used. In NLP, data augmentation techniques can include swapping, deletion, random insertion, among others.

Benchmarks

Add a Result

These leaderboards are used to track progress in Data Augmentation

Trend	Dataset	Best Model	Paper	Code	Compare
	ImageNet	DeiT-B (+MixPro)			See all
	CIFAR-10	Shake-Shake (26 2×96d) (Faster AA)			See all

Libraries

Use these libraries to find Data Augmentation models and implementations

Westlake-AI/openmixup

15 papers

569

rwightman/pytorch-image-models

7 papers

29,735

makcedward/nlpaug

7 papers

4,298

faceonlive/ai-research

7 papers

144

See all 7 libraries.

Datasets

Subtasks

Latest papers

Most implemented Social Latest No code

UnibucLLM: Harnessing LLMs for Automated Prediction of Item Difficulty and Response Time for Multiple-Choice Questions

ana-rogoz/bea-2024 • 20 Apr 2024

This work explores a novel data augmentation method based on Large Language Models (LLMs) for predicting item difficulty and response time of retired USMLE Multiple-Choice Questions (MCQs) in the BEA 2024 Shared Task.

20 Apr 2024

Paper
Code

A visualization method for data domain changes in CNN networks and the optimization method for selecting thresholds in classification tasks

searecluse/cvprw2024 • • 19 Apr 2024

In recent years, Face Anti-Spoofing (FAS) has played a crucial role in preserving the security of face recognition technology.

19 Apr 2024

Paper
Code

Aligning Actions and Walking to LLM-Generated Textual Descriptions

radu1999/walkandtext • 18 Apr 2024

For action recognition, we employ LLMs to generate textual descriptions of actions in the BABEL-60 dataset, facilitating the alignment of motion sequences with linguistic representations.

18 Apr 2024

Paper
Code

The Victim and The Beneficiary: Exploiting a Poisoned Model to Train a Clean Model on Poisoned Data

zixuan-zhu/vab • • ICCV 2023

This inspires us to propose a novel dual-network training framework: The Victim and The Beneficiary (V&B), which exploits a poisoned model to train a clean model without extra benign samples.

17 Apr 2024

Paper
Code

Consistency Training by Synthetic Question Generation for Conversational Question Answering

hamedhematian/syncqg • 17 Apr 2024

In our novel model-agnostic approach, referred to as CoTaH (Consistency-Trained augmented History), we augment the historical information with synthetic questions and subsequently employ consistency training to train a model that utilizes both real and augmented historical data to implicitly make the reasoning robust to irrelevant history.

17 Apr 2024

Paper
Code

Can We Break Free from Strong Data Augmentations in Self-Supervised Learning?

neurai-lab/ssl-prior • • 15 Apr 2024

Self-supervised learning (SSL) has emerged as a promising solution for addressing the challenge of limited labeled data in deep neural networks (DNNs), offering scalability potential.

15 Apr 2024

Paper
Code

RoofDiffusion: Constructing Roofs from Severely Corrupted Point Data via Diffusion

kylelo/roofdiffusion • 14 Apr 2024

Accurate completion and denoising of roof height maps are crucial to reconstructing high-quality 3D buildings.

14 Apr 2024

Paper
Code

DeDoDe v2: Analyzing and Improving the DeDoDe Keypoint Detector

parskatt/dedode • • 13 Apr 2024

First, we find that DeDoDe keypoints tend to cluster together, which we fix by performing non-max suppression on the target distribution of the detector during training.

305

13 Apr 2024

Paper
Code

MaSkel: A Model for Human Whole-body X-rays Generation from Human Masking Images

2022yingjie/maskel • • 13 Apr 2024

In our work, We proposed a new method to directly generate the 2D human whole-body X-rays from the human masking images.

13 Apr 2024

Paper
Code

An evaluation framework for synthetic data generation models

novelcore/synthetic_data_evaluation_framework • 13 Apr 2024

Two use case scenarios demonstrate the applicability of the proposed framework for evaluating the ability of synthetic data generation models to generated high quality data.

13 Apr 2024

Paper
Code

Data Augmentation

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Latest papers

Content

Benchmarks

Add a Result