41 papers with code • 0 benchmarks • 5 datasets
Transliteration is a mechanism for converting a word in a source (foreign) language to a target language, and often adopts approaches from machine translation. In machine translation, the objective is to preserve the semantic meaning of the utterance as much as possible while following the syntactic structure in the target language. In Transliteration, the objective is to preserve the original pronunciation of the source word as much as possible while following the phonological structures of the target language.
For example, the city’s name “Manchester” has become well known by people of languages other than English. These new words are often named entities that are important in cross-lingual information retrieval, information extraction, machine translation, and often present out-of-vocabulary challenges to spoken language technologies such as automatic speech recognition, spoken keyword search, and text-to-speech.
Source: Phonology-Augmented Statistical Framework for Machine Transliteration using Limited Linguistic Resources
These leaderboards are used to track progress in Transliteration
Most implemented papers
Universal Dependency Parsing for Hindi-English Code-switching
We present a treebank of Hindi-English code-switching tweets under Universal Dependencies scheme and propose a neural stacking model for parsing that efficiently leverages part-of-speech tag and syntactic tree annotations in the code-switching treebank and the preexisting Hindi and English treebanks.
Applying the Transformer to Character-level Transduction
The transformer has been shown to outperform recurrent neural network-based sequence-to-sequence models in various word-level NLP tasks.
Sub-Character Tokenization for Chinese Pretrained Language Models
2) Pronunciation-based SubChar tokenizers can encode Chinese homophones into the same transliteration sequences and produce the same tokenization output, hence being robust to homophone typos.
Bilingual dictionaries for all EU languages
In this work we present three different methods for cleaning noise from automatically generated bilingual dictionaries: LLR, pivot and translation based approach.
Sequence-to-sequence neural network models for transliteration
Transliteration is a key component of machine translation systems and software internationalization.
How Grammatical is Character-level Neural Machine Translation? Assessing MT Quality with Contrastive Translation Pairs
Analysing translation quality in regards to specific linguistic phenomena has historically been difficult and time-consuming.