Text Augmentation
33 papers with code • 0 benchmarks • 0 datasets
You can read these blog posts to get an overview of the approaches.
Benchmarks
These leaderboards are used to track progress in Text Augmentation
Libraries
Use these libraries to find Text Augmentation models and implementationsLatest papers with no code
Decoder-only Architecture for Speech Recognition with CTC Prompts and Text Data Augmentation
Because the decoder architecture is the same as an autoregressive LM, it is simple to enhance the model by leveraging external text data with LM training.
Probabilistic Linguistic Knowledge and Token-level Text Augmentation
This paper investigates the effectiveness of token-level text augmentation and the role of probabilistic linguistic knowledge within a linguistically-motivated evaluation context.
Text Generation with Speech Synthesis for ASR Data Augmentation
In this work, we explore text augmentation for ASR using large-scale pre-trained neural networks, and systematically compare those to traditional text augmentation methods.
Boosting Event Extraction with Denoised Structure-to-Text Augmentation
Event extraction aims to recognize pre-defined event triggers and arguments from texts, which suffer from the lack of high-quality annotations.
Shuffle & Divide: Contrastive Learning for Long Text
We propose a self-supervised learning method for long text documents based on contrastive learning.
Improving Fast-slow Encoder based Transducer with Streaming Deliberation
Experiments on Librispeech and in-house data show relative WER reductions (WERRs) from 3% to 5% with a slight increase in model size and negligible extra token emission latency compared with fast-slow encoder based transducer.
Enabling Classifiers to Make Judgements Explicitly Aligned with Human Values
Therefore, we introduce a framework for value-aligned classification that performs prediction based on explicitly written human values in the command.
Entity Aware Syntax Tree Based Data Augmentation for Natural Language Understanding
One of the main challenges is to collect a sufficient amount of annotated data to train a model.
Data Augmentation for Low-Resource Quechua ASR Improvement
In this paper we describe our data augmentation approach to improve the results of ASR models for low-resource and agglutinative languages.
Textual Data Augmentation for Arabic-English Code-Switching Speech Recognition
The pervasiveness of intra-utterance code-switching (CS) in spoken content requires that speech recognition (ASR) systems handle mixed language.