Data Augmentation

2517 papers with code • 2 benchmarks • 63 datasets

Data augmentation involves techniques used for increasing the amount of data, based on different modifications, to expand the amount of examples in the original dataset. Data augmentation not only helps to grow the dataset but it also increases the diversity of the dataset. When training machine learning models, data augmentation acts as a regularizer and helps to avoid overfitting.

Data augmentation techniques have been found useful in domains like NLP and computer vision. In computer vision, transformations like cropping, flipping, and rotation are used. In NLP, data augmentation techniques can include swapping, deletion, random insertion, among others.

Further readings:

( Image credit: Albumentations )

Libraries

Use these libraries to find Data Augmentation models and implementations

UnibucLLM: Harnessing LLMs for Automated Prediction of Item Difficulty and Response Time for Multiple-Choice Questions

ana-rogoz/bea-2024 20 Apr 2024

This work explores a novel data augmentation method based on Large Language Models (LLMs) for predicting item difficulty and response time of retired USMLE Multiple-Choice Questions (MCQs) in the BEA 2024 Shared Task.

0
20 Apr 2024

A visualization method for data domain changes in CNN networks and the optimization method for selecting thresholds in classification tasks

searecluse/cvprw2024 19 Apr 2024

In recent years, Face Anti-Spoofing (FAS) has played a crucial role in preserving the security of face recognition technology.

1
19 Apr 2024

Aligning Actions and Walking to LLM-Generated Textual Descriptions

radu1999/walkandtext 18 Apr 2024

For action recognition, we employ LLMs to generate textual descriptions of actions in the BABEL-60 dataset, facilitating the alignment of motion sequences with linguistic representations.

0
18 Apr 2024

The Victim and The Beneficiary: Exploiting a Poisoned Model to Train a Clean Model on Poisoned Data

zixuan-zhu/vab ICCV 2023

This inspires us to propose a novel dual-network training framework: The Victim and The Beneficiary (V&B), which exploits a poisoned model to train a clean model without extra benign samples.

6
17 Apr 2024

Consistency Training by Synthetic Question Generation for Conversational Question Answering

hamedhematian/syncqg 17 Apr 2024

In our novel model-agnostic approach, referred to as CoTaH (Consistency-Trained augmented History), we augment the historical information with synthetic questions and subsequently employ consistency training to train a model that utilizes both real and augmented historical data to implicitly make the reasoning robust to irrelevant history.

0
17 Apr 2024

Can We Break Free from Strong Data Augmentations in Self-Supervised Learning?

neurai-lab/ssl-prior 15 Apr 2024

Self-supervised learning (SSL) has emerged as a promising solution for addressing the challenge of limited labeled data in deep neural networks (DNNs), offering scalability potential.

0
15 Apr 2024

RoofDiffusion: Constructing Roofs from Severely Corrupted Point Data via Diffusion

kylelo/roofdiffusion 14 Apr 2024

Accurate completion and denoising of roof height maps are crucial to reconstructing high-quality 3D buildings.

2
14 Apr 2024

DeDoDe v2: Analyzing and Improving the DeDoDe Keypoint Detector

parskatt/dedode 13 Apr 2024

First, we find that DeDoDe keypoints tend to cluster together, which we fix by performing non-max suppression on the target distribution of the detector during training.

305
13 Apr 2024

MaSkel: A Model for Human Whole-body X-rays Generation from Human Masking Images

2022yingjie/maskel 13 Apr 2024

In our work, We proposed a new method to directly generate the 2D human whole-body X-rays from the human masking images.

2
13 Apr 2024

An evaluation framework for synthetic data generation models

novelcore/synthetic_data_evaluation_framework 13 Apr 2024

Two use case scenarios demonstrate the applicability of the proposed framework for evaluating the ability of synthetic data generation models to generated high quality data.

1
13 Apr 2024