Text Normalization

27 papers with code • 0 benchmarks • 0 datasets

This task has no description! Would you like to contribute one?

Most implemented papers

Applying the Transformer to Character-level Transduction

shijie-wu/neural-transducer EACL 2021

The transformer has been shown to outperform recurrent neural network-based sequence-to-sequence models in various word-level NLP tasks.

Shallow Fusion of Weighted Finite-State Transducer and Language Model for Text Normalization

NVIDIA/NeMo 29 Mar 2022

First, a non-deterministic WFST outputs all normalization candidates, and then a neural language model picks the best one -- similar to shallow fusion for automatic speech recognition.

Encoder-Decoder Methods for Text Normalization

tatyana-ruzsics/uzh-corpuslab-normalization COLING 2018

Text normalization has been addressed with a variety of methods, most successfully with character-level statistical machine translation (CSMT).

A Large-Scale Comparison of Historical Text Normalization Systems

coastalcph/histnorm NAACL 2019

There is no consensus on the state-of-the-art approach to historical text normalization.

hinglishNorm -- A Corpus of Hindi-English Code Mixed Sentences for Text Normalization

piyushmakhija5/normalizationdataset 18 Oct 2020

We present hinglishNorm -- a human annotated corpus of Hindi-English code-mixed sentences for text normalization task.

Evaluating Informal-Domain Word Representations With UrbanDictionary

nsaphra/urbandic-scraper WS 2016

Existing corpora for intrinsic evaluation are not targeted towards tasks in informal domains such as Twitter or news comment forums.

RNN Approaches to Text Normalization: A Challenge

rwsproat/text-normalization-data 31 Oct 2016

Though our conclusions are largely negative on this point, we are actually not arguing that the text normalization problem is intractable using an pure RNN approach, merely that it is not going to be something that can be solved merely by having huge amounts of annotated text data and feeding that to a general RNN model.

Machine Translation of Low-Resource Spoken Dialects: Strategies for Normalizing Swiss German

Kyubyong/quasi-rnn LREC 2018

The goal of this work is to design a machine translation (MT) system for a low-resource family of dialects, collectively known as Swiss German, which are widely spoken in Switzerland but seldom written.

Text normalization using memory augmented neural networks

cognibit/Text-Normalization-Demo 31 May 2018

We perform text normalization, i. e. the transformation of words from the written to the spoken form, using a memory augmented neural network.

A Character-Level Approach to the Text Normalization Problem Based on a New Causal Encoder

adrianjav/text-normalization-preprocess 6 Mar 2019

Text normalization is a ubiquitous process that appears as the first step of many Natural Language Processing problems.