Neural machine translation (NMT) models typically operate with a fixed vocabulary, but translation is an open-vocabulary problem.
We analyze some of the fundamental design challenges that impact the development of a multilingual state-of-the-art named entity transliteration system, including curating bilingual named entity datasets and evaluation of multiple transliteration methods.
Transliteration is a key component of machine translation systems and software internationalization.
Analysing translation quality in regards to specific linguistic phenomena has historically been difficult and time-consuming.
The ANETAC dataset is mainly aimed for the researchers that are working on Arabic named entity transliteration, but it can also be used for named entity classification purposes.
Generating the English transliteration of a name written in a foreign script is an important and challenging step in multilingual knowledge acquisition and information extraction.