Search Results for author: Meriem Boubdir

Found 2 papers, 0 papers with code

Elo Uncovered: Robustness and Best Practices in Language Model Evaluation

no code implementations29 Nov 2023 Meriem Boubdir, Edward Kim, Beyza Ermis, Sara Hooker, Marzieh Fadaee

In Natural Language Processing (NLP), the Elo rating system, originally designed for ranking players in dynamic games such as chess, is increasingly being used to evaluate Large Language Models (LLMs) through "A vs B" paired comparisons.

Language Modelling

Which Prompts Make The Difference? Data Prioritization For Efficient Human LLM Evaluation

no code implementations22 Oct 2023 Meriem Boubdir, Edward Kim, Beyza Ermis, Marzieh Fadaee, Sara Hooker

Human evaluation is increasingly critical for assessing large language models, capturing linguistic nuances, and reflecting user preferences more accurately than traditional automated metrics.

Language Modelling Large Language Model

Cannot find the paper you are looking for? You can Submit a new open access paper.