Text Augmentation

33 papers with code • 0 benchmarks • 0 datasets

You can read these blog posts to get an overview of the approaches.

Libraries

Use these libraries to find Text Augmentation models and implementations
3 papers
4,299
2 papers
370

Latest papers with no code

Decoder-only Architecture for Speech Recognition with CTC Prompts and Text Data Augmentation

no code yet • 16 Sep 2023

Because the decoder architecture is the same as an autoregressive LM, it is simple to enhance the model by leveraging external text data with LM training.

Probabilistic Linguistic Knowledge and Token-level Text Augmentation

no code yet • 29 Jun 2023

This paper investigates the effectiveness of token-level text augmentation and the role of probabilistic linguistic knowledge within a linguistically-motivated evaluation context.

Text Generation with Speech Synthesis for ASR Data Augmentation

no code yet • 22 May 2023

In this work, we explore text augmentation for ASR using large-scale pre-trained neural networks, and systematically compare those to traditional text augmentation methods.

Boosting Event Extraction with Denoised Structure-to-Text Augmentation

no code yet • 16 May 2023

Event extraction aims to recognize pre-defined event triggers and arguments from texts, which suffer from the lack of high-quality annotations.

Shuffle & Divide: Contrastive Learning for Long Text

no code yet • 19 Apr 2023

We propose a self-supervised learning method for long text documents based on contrastive learning.

Improving Fast-slow Encoder based Transducer with Streaming Deliberation

no code yet • 15 Dec 2022

Experiments on Librispeech and in-house data show relative WER reductions (WERRs) from 3% to 5% with a slight increase in model size and negligible extra token emission latency compared with fast-slow encoder based transducer.

Enabling Classifiers to Make Judgements Explicitly Aligned with Human Values

no code yet • 14 Oct 2022

Therefore, we introduce a framework for value-aligned classification that performs prediction based on explicitly written human values in the command.

Entity Aware Syntax Tree Based Data Augmentation for Natural Language Understanding

no code yet • 6 Sep 2022

One of the main challenges is to collect a sufficient amount of annotated data to train a model.

Data Augmentation for Low-Resource Quechua ASR Improvement

no code yet • 14 Jul 2022

In this paper we describe our data augmentation approach to improve the results of ASR models for low-resource and agglutinative languages.

Textual Data Augmentation for Arabic-English Code-Switching Speech Recognition

no code yet • 7 Jan 2022

The pervasiveness of intra-utterance code-switching (CS) in spoken content requires that speech recognition (ASR) systems handle mixed language.