Text Normalization
27 papers with code • 0 benchmarks • 0 datasets
Benchmarks
These leaderboards are used to track progress in Text Normalization
Most implemented papers
Applying the Transformer to Character-level Transduction
The transformer has been shown to outperform recurrent neural network-based sequence-to-sequence models in various word-level NLP tasks.
Shallow Fusion of Weighted Finite-State Transducer and Language Model for Text Normalization
First, a non-deterministic WFST outputs all normalization candidates, and then a neural language model picks the best one -- similar to shallow fusion for automatic speech recognition.
Encoder-Decoder Methods for Text Normalization
Text normalization has been addressed with a variety of methods, most successfully with character-level statistical machine translation (CSMT).
A Large-Scale Comparison of Historical Text Normalization Systems
There is no consensus on the state-of-the-art approach to historical text normalization.
hinglishNorm -- A Corpus of Hindi-English Code Mixed Sentences for Text Normalization
We present hinglishNorm -- a human annotated corpus of Hindi-English code-mixed sentences for text normalization task.
Evaluating Informal-Domain Word Representations With UrbanDictionary
Existing corpora for intrinsic evaluation are not targeted towards tasks in informal domains such as Twitter or news comment forums.
RNN Approaches to Text Normalization: A Challenge
Though our conclusions are largely negative on this point, we are actually not arguing that the text normalization problem is intractable using an pure RNN approach, merely that it is not going to be something that can be solved merely by having huge amounts of annotated text data and feeding that to a general RNN model.
Machine Translation of Low-Resource Spoken Dialects: Strategies for Normalizing Swiss German
The goal of this work is to design a machine translation (MT) system for a low-resource family of dialects, collectively known as Swiss German, which are widely spoken in Switzerland but seldom written.
Text normalization using memory augmented neural networks
We perform text normalization, i. e. the transformation of words from the written to the spoken form, using a memory augmented neural network.
A Character-Level Approach to the Text Normalization Problem Based on a New Causal Encoder
Text normalization is a ubiquitous process that appears as the first step of many Natural Language Processing problems.