1 code implementation • 14 Aug 2023 • Matt Post, Thamme Gowda, Roman Grundkiewicz, Huda Khayrallah, Rohit Jain, Marcin Junczys-Dowmunt
Many machine translation toolkits make use of a data preparation step wherein raw data is transformed into a tensor format that can be used directly by the trainer.
no code implementations • 11 Oct 2022 • Thamme Gowda, Mozhdeh Gheini, Jonathan May
Code-switching is a common phenomenon among multilingual speakers, where alternation between two or more languages occurs within the context of a single conversation.
1 code implementation • NAACL 2021 • Thamme Gowda, Weiqiu You, Constantine Lignos, Jonathan May
While traditional corpus-level evaluation metrics for machine translation (MT) correlate well with fluency, they struggle to reflect adequacy.
2 code implementations • ACL 2021 • Thamme Gowda, Zhao Zhang, Chris A Mattmann, Jonathan May
While there are more than 7000 languages in the world, most translation research efforts have targeted a few high-resource languages.
1 code implementation • Findings of the Association for Computational Linguistics 2020 • Thamme Gowda, Jonathan May
We cast neural machine translation (NMT) as a classification task in an autoregressive setting and analyze the limitations of both classification and autoregression components.
no code implementations • WS 2019 • Xiaoman Pan, Thamme Gowda, Heng Ji, Jonathan May, Scott Miller
Because this multilingual common space directly relates the semantics of contextual words in the source language to that of entities in the target language, we leverage it for unsupervised cross-lingual entity linking.
1 code implementation • 24 Oct 2019 • Ninareh Mehrabi, Thamme Gowda, Fred Morstatter, Nanyun Peng, Aram Galstyan
We study the bias in several state-of-the-art named entity recognition (NER) models---specifically, a difference in the ability to recognize male and female names as PERSON entity types.
no code implementations • ACL 2019 • Elizabeth Boschee, Joel Barry, Jayadev Billa, Marjorie Freedman, Thamme Gowda, Constantine Lignos, Chester Palen-Michel, Michael Pust, Banriskhem Kayang Khonglah, Srikanth Madikeri, Jonathan May, Scott Miller
In this paper we present an end-to-end cross-lingual information retrieval (CLIR) and summarization system for low-resource languages that 1) enables English speakers to search foreign language repositories of text and audio using English queries, 2) summarizes the retrieved documents in English with respect to a particular information need, and 3) provides complete transcriptions and translations as needed.
no code implementations • 3 Dec 2017 • Kyle Hundman, Thamme Gowda, Mayank Kejriwal, Benedikt Boecking
Web-based human trafficking activity has increased in recent years but it remains sparsely dispersed among escort advertisements and difficult to identify due to its often-latent nature.