41 papers with code • 0 benchmarks • 5 datasets

Transliteration is a mechanism for converting a word in a source (foreign) language to a target language, and often adopts approaches from machine translation. In machine translation, the objective is to preserve the semantic meaning of the utterance as much as possible while following the syntactic structure in the target language. In Transliteration, the objective is to preserve the original pronunciation of the source word as much as possible while following the phonological structures of the target language.

For example, the city’s name “Manchester” has become well known by people of languages other than English. These new words are often named entities that are important in cross-lingual information retrieval, information extraction, machine translation, and often present out-of-vocabulary challenges to spoken language technologies such as automatic speech recognition, spoken keyword search, and text-to-speech.

Source: Phonology-Augmented Statistical Framework for Machine Transliteration using Limited Linguistic Resources

Most implemented papers

Universal Dependency Parsing for Hindi-English Code-switching

irshadbhat/nsdp-cs NAACL 2018

We present a treebank of Hindi-English code-switching tweets under Universal Dependencies scheme and propose a neural stacking model for parsing that efficiently leverages part-of-speech tag and syntactic tree annotations in the code-switching treebank and the preexisting Hindi and English treebanks.

Applying the Transformer to Character-level Transduction

shijie-wu/neural-transducer EACL 2021

The transformer has been shown to outperform recurrent neural network-based sequence-to-sequence models in various word-level NLP tasks.

Sub-Character Tokenization for Chinese Pretrained Language Models

thunlp/subchartokenization 1 Jun 2021

2) Pronunciation-based SubChar tokenizers can encode Chinese homophones into the same transliteration sequences and produce the same tokenization output, hence being robust to homophone typos.

Bilingual dictionaries for all EU languages

pmarcis/dict-filtering LREC 2014

In this work we present three different methods for cleaning noise from automatically generated bilingual dictionaries: LLR, pivot and translation based approach.

Sequence-to-sequence neural network models for transliteration

googlei18n/transliteration 29 Oct 2016

Transliteration is a key component of machine translation systems and software internationalization.

How Grammatical is Character-level Neural Machine Translation? Assessing MT Quality with Contrastive Translation Pairs

rsennrich/lingeval97 EACL 2017

Analysing translation quality in regards to specific linguistic phenomena has historically been difficult and time-consuming.