no code implementations • 30 Jan 2025 • Muhammed Yusuf Kocyigit, Eleftheria Briakou, Daniel Deutsch, Jiaming Luo, Colin Cherry, Markus Freitag
Data contamination -- the accidental consumption of evaluation examples within the pre-training data -- can undermine the validity of evaluation benchmarks.
no code implementations • 6 Nov 2024 • Aaditya K. Singh, Muhammed Yusuf Kocyigit, Andrew Poulton, David Esiobu, Maria Lomeli, Gergely Szilvasy, Dieuwke Hupkes
We propose a novel analysis method called ConTAM, and show with a large scale survey of existing and novel n-gram based contamination metrics across 13 benchmarks and 7 models from 2 different families that ConTAM can be used to better understand evaluation data contamination and its effects.
1 code implementation • 20 Oct 2022 • Zilu Tang, Muhammed Yusuf Kocyigit, Derry Wijaya
Data augmentation techniques have been proven useful in many applications in NLP fields.
1 code implementation • Findings (NAACL) 2022 • Afra Feyza Akyürek, Sejin Paik, Muhammed Yusuf Kocyigit, Seda Akbiyik, Şerife Leman Runyun, Derry Wijaya
Large language models trained on a mixture of NLP tasks that are converted into a text-to-text format using prompts, can generalize into novel forms of language and handle novel tasks.
1 code implementation • NAACL (GeBNLP) 2022 • Afra Feyza Akyürek, Muhammed Yusuf Kocyigit, Sejin Paik, Derry Wijaya
Researchers have devised numerous ways to quantify social biases vested in pretrained language models.
no code implementations • Findings (ACL) 2022 • Muhammed Yusuf Kocyigit, Jiho Lee, Derry Wijaya
We show that State-of-the-art QE models, when tested in a Parallel Corpus Mining (PCM) setting, perform unexpectedly bad due to a lack of robustness to out-of-domain examples.
1 code implementation • ACL (EvalNLGEval, INLG) 2020 • Hassan Kane, Muhammed Yusuf Kocyigit, Ali Abdalla, Pelkins Ajanoh, Mohamed Coulibali
We present NUBIA, a methodology to build automatic evaluation metrics for text generation using only machine learning models as core components.