no code implementations • EMNLP (Eval4NLP) 2020 • Neslihan Iskender, Tim Polzehl, Sebastian Möller
On the one hand, the human assessment of summarization quality conducted by linguistic experts is slow, expensive, and still not a standardized procedure.
1 code implementation • EACL (HumEval) 2021 • Neslihan Iskender, Tim Polzehl, Sebastian Möller
Based on our empirical analysis, we provide guidelines to ensure the reliability of expert and non-expert evaluations, and we determine the factors that might affect the reliability of the human evaluation.
no code implementations • EACL (HCINLP) 2021 • Neslihan Iskender, Tim Polzehl, Sebastian Möller
In recent years, crowdsourcing has gained much attention from researchers to generate data for the Natural Language Generation (NLG) tools or to evaluate them.
1 code implementation • NAACL 2022 • Spencer Braun, Oleg Vasilyev, Neslihan Iskender, John Bohannon
The creation of a quality summarization dataset is an expensive, time-consuming effort, requiring the production and evaluation of summaries by both trained humans and machines.
no code implementations • 13 May 2021 • Neslihan Iskender, Oleg Vasilyev, Tim Polzehl, John Bohannon, Sebastian Möller
Evaluating large summarization corpora using humans has proven to be expensive from both the organizational and the financial perspective.
no code implementations • LREC 2020 • Neslihan Iskender, Tim Polzehl, Sebastian M{\"o}ller
The intrinsic and extrinsic quality evaluation is an essential part of the summary evaluation methodology usually conducted in a traditional controlled laboratory environment.