no code implementations • EMNLP 2021 • David M. Howcroft, Verena Rieser
Previous work has shown that human evaluations in NLP are notoriously under-powered.
no code implementations • INLG (ACL) 2020 • David M. Howcroft, Anya Belz, Miruna-Adriana Clinciu, Dimitra Gkatzia, Sadid A. Hasan, Saad Mahamood, Simon Mille, Emiel van Miltenburg, Sashank Santhanam, Verena Rieser
Human assessment remains the most trusted form of evaluation in NLG, but highly diverse approaches and a proliferation of different quality criteria used by researchers make it difficult to compare results and draw conclusions across papers, with adverse implications for meta-evaluation and reproducibility.
no code implementations • INLG (ACL) 2020 • Anya Belz, Simon Mille, David M. Howcroft
Current standards for designing and reporting human evaluations in NLP mean it is generally unclear which evaluations are comparable and can be expected to yield similar results when applied to the same system outputs.
1 code implementation • 17 Aug 2024 • Patrícia Schmidtová, Saad Mahamood, Simone Balloccu, Ondřej Dušek, Albert Gatt, Dimitra Gkatzia, David M. Howcroft, Ondřej Plátek, Adarsa Sivaprasad
Automatic metrics are extensively used to evaluate natural language processing systems.
1 code implementation • ACL 2021 • Karin Sevegnani, David M. Howcroft, Ioannis Konstas, Verena Rieser
Mixed initiative in open-domain dialogue requires a system to pro-actively introduce new topics.
1 code implementation • WS 2019 • Ondřej Dušek, David M. Howcroft, Verena Rieser
Neural natural language generation (NNLG) systems are known for their pathological outputs, i. e. generating text which is unrelated to the input specification.
Ranked #3 on Data-to-Text Generation on Cleaned E2E NLG Challenge
no code implementations • WS 2018 • David M. Howcroft, Dietrich Klakow, Vera Demberg
Developing conventional natural language generation systems requires extensive attention from human experts in order to craft complex sets of sentence planning rules.
no code implementations • EACL 2017 • David M. Howcroft, Vera Demberg
While previous research on readability has typically focused on document-level measures, recent work in areas such as natural language generation has pointed out the need of sentence-level readability measures.
no code implementations • COLING 2016 • Maximilian Schwenger, {\'A}lvaro Torralba, Joerg Hoffmann, David M. Howcroft, Vera Demberg
The search space in grammar-based natural language generation tasks can get very large, which is particularly problematic when generating long utterances or paragraphs.