no code implementations • WMT (EMNLP) 2020 • Philipp Koehn, Vishrav Chaudhary, Ahmed El-Kishky, Naman Goyal, Peng-Jen Chen, Francisco Guzmán
Following two preceding WMT Shared Task on Parallel Corpus Filtering (Koehn et al., 2018, 2019), we posed again the challenge of assigning sentence-level quality scores for very noisy corpora of sentence pairs crawled from the web, with the goal of sub-selecting the highest-quality data to be used to train ma-chine translation systems.
no code implementations • COLING (WNUT) 2022 • Jinning Li, Shubhanshu Mishra, Ahmed El-Kishky, Sneha Mehta, Vivek Kulkarni
We refer to these annotations as Non-Textual Units (NTUs).
no code implementations • 28 Oct 2022 • Frank Portman, Stephen Ragain, Ahmed El-Kishky
Providing personalized recommendations in an environment where items exhibit ephemerality and temporal relevancy (e. g. in social media) presents a few unique challenges: (1) inductively understanding ephemeral appeal for items in a setting where new items are created frequently, (2) adapting to trends within engagement patterns where items may undergo temporal shifts in relevance, (3) accurately modeling user preferences over this item space where users may express multiple interests.
1 code implementation • 15 Sep 2022 • Xinyang Zhang, Yury Malkov, Omar Florez, Serim Park, Brian McWilliams, Jiawei Han, Ahmed El-Kishky
Most existing PLMs are not tailored to the noisy user-generated text on social media, and the pre-training does not factor in the valuable social engagement logs available in a social network.
no code implementations • 13 Sep 2022 • FatemehSadat Mireshghallah, Nikolai Vogler, Junxian He, Omar Florez, Ahmed El-Kishky, Taylor Berg-Kirkpatrick
User-generated social media data is constantly changing as new trends influence online discussion and personal information is deleted due to privacy concerns.
no code implementations • 12 May 2022 • Ahmed El-Kishky, Thomas Markovich, Kenny Leung, Frank Portman, Aria Haghighi, Ying Xiao
To this end, we introduce kNN-Embed, a general approach to improving diversity in dense ANN-based retrieval.
1 code implementation • 27 Jan 2022 • John Pougué-Biyong, Akshay Gupta, Aria Haghighi, Ahmed El-Kishky
We propose the Stance Embeddings Model(SEM), which jointly learns embeddings for each user and topic in signed social graphs with distinct edge types for each topic.
no code implementations • EMNLP 2021 • Shuo Sun, Ahmed El-Kishky, Vishrav Chaudhary, James Cross, Francisco Guzmán, Lucia Specia
Sentence-level Quality estimation (QE) of machine translation is traditionally formulated as a regression task, and the performance of QE models is typically measured by Pearson correlation with human labels.
1 code implementation • Findings (ACL) 2021 • Jun Wang, Chang Xu, Francisco Guzman, Ahmed El-Kishky, Benjamin I. P. Rubinstein, Trevor Cohn
Mistranslated numbers have the potential to cause serious effects, such as financial loss or medical misinformation.
1 code implementation • 12 Jul 2021 • Jun Wang, Chang Xu, Francisco Guzman, Ahmed El-Kishky, Yuqing Tang, Benjamin I. P. Rubinstein, Trevor Cohn
Neural machine translation systems are known to be vulnerable to adversarial test inputs, however, as we show in this paper, these systems are also vulnerable to training attacks.
1 code implementation • ACL 2021 • Wei-Jen Ko, Ahmed El-Kishky, Adithya Renduchintala, Vishrav Chaudhary, Naman Goyal, Francisco Guzmán, Pascale Fung, Philipp Koehn, Mona Diab
The scarcity of parallel data is a major obstacle for training high-quality machine translation systems for low-resource languages.
no code implementations • EMNLP 2021 • Ahmed El-Kishky, Adithya Renduchintala, James Cross, Francisco Guzmán, Philipp Koehn
Cross-lingual named-entity lexica are an important resource to multilingual NLP tasks such as machine translation and cross-lingual wikification.
no code implementations • EACL 2021 • Yi-Lin Tuan, Ahmed El-Kishky, Adithya Renduchintala, Vishrav Chaudhary, Francisco Guzmán, Lucia Specia
Quality estimation aims to measure the quality of translated content without access to a reference translation.
no code implementations • Asian Chapter of the Association for Computational Linguistics 2020 • Shuo Sun, Marina Fomicheva, Fr{\'e}d{\'e}ric Blain, Vishrav Chaudhary, Ahmed El-Kishky, Adithya Renduchintala, Francisco Guzm{\'a}n, Lucia Specia
Predicting the quality of machine translation has traditionally been addressed with language-specific models, under the assumption that the quality label distribution or linguistic features exhibit traits that are not shared across languages.
7 code implementations • 21 Oct 2020 • Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi Ma, Ahmed El-Kishky, Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, Naman Goyal, Tom Birch, Vitaliy Liptchinsky, Sergey Edunov, Edouard Grave, Michael Auli, Armand Joulin
Existing work in translation demonstrated the potential of massively multilingual machine translation by training a single model able to translate between any pair of languages.
1 code implementation • Asian Chapter of the Association for Computational Linguistics 2020 • Ahmed El-Kishky, Francisco Guzmán
Document alignment aims to identify pairs of documents in two distinct languages that are of comparable content or translations of each other.
no code implementations • 14 Nov 2019 • Hyungsul Kim, Ahmed El-Kishky, Xiang Ren, Jiawei Han
This proximity network captures the corpus-level co-occurence statistics for candidate event descriptors, event attributes, as well as their connections.
no code implementations • EMNLP 2020 • Ahmed El-Kishky, Vishrav Chaudhary, Francisco Guzman, Philipp Koehn
We mine sixty-eight snapshots of the Common Crawl corpus and identify web document pairs that are translations of each other.
no code implementations • 3 Nov 2019 • David Golub, Ahmed El-Kishky, Roberto Martín-Martín
Current semantic segmentation models cannot easily generalize to new object classes unseen during train time: they require additional annotated images and retraining.
no code implementations • WS 2019 • Peng-Jen Chen, Jiajun Shen, Matt Le, Vishrav Chaudhary, Ahmed El-Kishky, Guillaume Wenzek, Myle Ott, Marc'Aurelio Ranzato
This paper describes Facebook AI's submission to the WAT 2019 Myanmar-English translation task.
no code implementations • 18 Aug 2019 • Ahmed El-Kishky, Frank Xu, Aston Zhang, Jiawei Han
However, in many languages and specialized corpora, words are composed by concatenating semantically meaningful subword structures.
no code implementations • WS 2019 • Ahmed El-Kishky, Xingyu Fu, Aseel Addawood, Nahil Sobh, Clare Voss, Jiawei Han
In this paper, we tackle the problem of {``}root extraction{''} from words in the Semitic language family.
no code implementations • WS 2018 • Ahmed El-Kishky, Frank Xu, Aston Zhang, Stephen Macke, Jiawei Han
Recent literature has shown a wide variety of benefits to mapping traditional one-hot representations of words and phrases to lower-dimensional real-valued vectors known as word embeddings.
1 code implementation • 26 Apr 2018 • Qi Zhu, Xiang Ren, Jingbo Shang, Yu Zhang, Ahmed El-Kishky, Jiawei Han
However, current Open IE systems focus on modeling local context information in a sentence to extract relation tuples, while ignoring the fact that global statistics in a large corpus can be collectively leveraged to identify high-quality sentence-level extractions.
no code implementations • 24 Jun 2014 • Ahmed El-Kishky, Yanglei Song, Chi Wang, Clare Voss, Jiawei Han
Our solution combines a novel phrase mining framework to segment a document into single and multi-word phrases, and a new topic model that operates on the induced document partition.