Lexical Normalization

15 papers with code • 1 benchmarks • 1 datasets

Lexical normalization is the task of translating/transforming a non standard text to a standard register.

Example:

new pix comming tomoroe
new pictures coming tomorrow

Datasets usually consists of tweets, since these naturally contain a fair amount of these phenomena.

For lexical normalization, only replacements on the word-level are annotated. Some corpora include annotation for 1-N and N-1 replacements. However, word insertion/deletion and reordering is not part of the task.

Benchmarks

Add a Result

These leaderboards are used to track progress in Lexical Normalization

Trend	Dataset	Best Model	Paper	Code	Compare
	LexNorm	MoNoise			See all

Datasets

MultiSenti

Subtasks

Pronunciation Dictionary Creation

Latest papers with no code

Most implemented Social Latest No code

A Character-level Ngram-based MT Approach for Lexical Normalization in Social Media

no code yet • ACL ARR December 2022

This paper presents an ngram-based MT approach that operates at character-level to generate possible canonical forms for lexical variants in social media text.

Paper
Add Code

Contrastive String Representation Learning using Synthetic Data

no code yet • 8 Oct 2021

We demonstrate the effectiveness of our approach by evaluating the learned representation on the task of string similarity matching.

Paper
Add Code

Sequence-to-Sequence Lexical Normalization with Multilingual Transformers

no code yet • WNUT (ACL) 2021

Our results show that while word-level, intrinsic, performance evaluation is behind other methods, our model improves performance on extrinsic, downstream tasks through normalization compared to models operating on raw, unprocessed, social media text.

Paper
Add Code

Lexical Normalization for Code-switched Data and its Effect on POS-tagging

no code yet • 1 Jun 2020

Lexical normalization, the translation of non-canonical data to standard language, has shown to improve the performance of manynatural language processing tasks on social media.

Paper
Add Code

Norm It! Lexical Normalization for Italian and Its Downstream Effects for Dependency Parsing

no code yet • LREC 2020

However, for Italian, there is no benchmark available for lexical normalization, despite the presence of many benchmarks for other tasks involving social media data.

Paper
Add Code

Synthetic Data for English Lexical Normalization: How Close Can We Get to Manually Annotated Data?

no code yet • LREC 2020

With this system, we score 94. 29 accuracy on the test data, compared to 95. 22 when it is trained on human-annotated data.

Paper
Add Code

An In-depth Analysis of the Effect of Lexical Normalization on the Dependency Parsing of Social Media

no code yet • WS 2019

Existing natural language processing systems have often been designed with standard texts in mind.

Paper
Add Code

Enhancing BERT for Lexical Normalization

no code yet • WS 2019

In this article, focusing on User Generated Content (UGC), we study the ability of BERT to perform lexical normalisation.

Paper
Add Code

Normalization of Indonesian-English Code-Mixed Twitter Data

no code yet • WS 2019

Twitter is an excellent source of data for NLP researches as it offers tremendous amount of textual data.

Paper
Add Code

Lexical Normalization of User-Generated Medical Text

no code yet • WS 2019

In the medical domain, user-generated social media text is increasingly used as a valuable complementary knowledge source to scientific medical literature.

Paper
Add Code

Lexical Normalization

Benchmarks Add a Result

Datasets

Subtasks

Latest papers with no code

Content

Benchmarks

Add a Result