Search Results for author: Emiel van Miltenburg

Found 31 papers, 12 papers with code

Evaluation rules! On the use of grammars and rule-based systems for NLG evaluation

no code implementations • ACL (EvalNLGEval, INLG) 2020 • Emiel van Miltenburg, Chris van der Lee, Thiago Castro-Ferreira, Emiel Krahmer

NLG researchers often use uncontrolled corpora to train and evaluate their systems, using textual similarity metrics, such as BLEU.

nlg evaluation Position

Paper
Add Code

Gradations of Error Severity in Automatic Image Descriptions

no code implementations • INLG (ACL) 2020 • Emiel van Miltenburg, Wei-Ting Lu, Emiel Krahmer, Albert Gatt, Guanyi Chen, Lin Li, Kees Van Deemter

Because our manipulated descriptions form minimal pairs with the reference descriptions, we are able to assess the impact of different kinds of errors on the perceived quality of the descriptions.

Paper
Add Code

Twenty Years of Confusion in Human Evaluation: NLG Needs Evaluation Sheets and Standardised Definitions

no code implementations • INLG (ACL) 2020 • David M. Howcroft, Anya Belz, Miruna-Adriana Clinciu, Dimitra Gkatzia, Sadid A. Hasan, Saad Mahamood, Simon Mille, Emiel van Miltenburg, Sashank Santhanam, Verena Rieser

Human assessment remains the most trusted form of evaluation in NLG, but highly diverse approaches and a proliferation of different quality criteria used by researchers make it difficult to compare results and draw conclusions across papers, with adverse implications for meta-evaluation and reproducibility.

Experimental Design

Paper
Add Code

How Do Image Description Systems Describe People? A Targeted Assessment of System Competence in the PEOPLE-domain

1 code implementation • LANTERN (COLING) 2020 • Emiel van Miltenburg

While useful, these evaluations do not tell us anything about the kinds of image descriptions that systems are able to produce.

Paper
Code

Open Dutch WordNet

1 code implementation • GWC 2016 • Marten Postma, Emiel van Miltenburg, Roxane Segers, Anneleen Schoen, Piek Vossen

We describe Open Dutch WordNet, which has been derived from the Cornetto database, the Princeton WordNet and open source resources.

Paper
Code

WordNet-based similarity metrics for adjectives

no code implementations • GWC 2016 • Emiel van Miltenburg

Le and Fokkens (2015) recently showed that taxonomy-based approaches are more reliable than corpus-based approaches in estimating human similarity ratings.

Paper
Add Code

Evaluating Task-oriented Dialogue Systems: A Systematic Review of Measures, Constructs and their Operationalisations

no code implementations • 21 Dec 2023 • Anouck Braggaar, Christine Liebrecht, Emiel van Miltenburg, Emiel Krahmer

This review gives an extensive overview of evaluation methods for task-oriented dialogue systems, paying special attention to practical applications of dialogue systems, for example for customer service.

Task-Oriented Dialogue Systems

Paper
Add Code

Missing Information, Unresponsive Authors, Experimental Flaws: The Impossibility of Assessing the Reproducibility of Previous Human Evaluations in NLP

no code implementations • 2 May 2023 • Anya Belz, Craig Thomson, Ehud Reiter, Gavin Abercrombie, Jose M. Alonso-Moral, Mohammad Arvan, Anouck Braggaar, Mark Cieliebak, Elizabeth Clark, Kees Van Deemter, Tanvi Dinkar, Ondřej Dušek, Steffen Eger, Qixiang Fang, Mingqi Gao, Albert Gatt, Dimitra Gkatzia, Javier González-Corbelle, Dirk Hovy, Manuela Hürlimann, Takumi Ito, John D. Kelleher, Filip Klubicka, Emiel Krahmer, Huiyuan Lai, Chris van der Lee, Yiru Li, Saad Mahamood, Margot Mieskes, Emiel van Miltenburg, Pablo Mosteiro, Malvina Nissim, Natalie Parde, Ondřej Plátek, Verena Rieser, Jie Ruan, Joel Tetreault, Antonio Toral, Xiaojun Wan, Leo Wanner, Lewis Watson, Diyi Yang

We report our efforts in identifying a set of previous human evaluations in NLP that would be suitable for a coordinated study examining what makes human evaluations in NLP more/less reproducible.

Paper
Add Code

Evaluating NLG systems: A brief introduction

no code implementations • 29 Mar 2023 • Emiel van Miltenburg

This year the International Conference on Natural Language Generation (INLG) will feature an award for the paper with the best evaluation.

Text Generation

Paper
Add Code

Implicit causality in GPT-2: a case study

no code implementations • 8 Dec 2022 • Hien Huynh, Tomas O. Lentz, Emiel van Miltenburg

This case study investigates the extent to which a language model (GPT-2) is able to capture native speakers' intuitions about implicit causality in a sentence completion task.

Language Modelling Object +2

Paper
Add Code

Underreporting of errors in NLG output, and what to do about it

no code implementations • INLG (ACL) 2021 • Emiel van Miltenburg, Miruna-Adriana Clinciu, Ondřej Dušek, Dimitra Gkatzia, Stephanie Inglis, Leo Leppänen, Saad Mahamood, Emma Manning, Stephanie Schoch, Craig Thomson, Luou Wen

We observe a severe under-reporting of the different kinds of errors that Natural Language Generation systems make.

Position Text Generation

Paper
Add Code

Automatic Construction of Evaluation Suites for Natural Language Generation Datasets

no code implementations • 16 Jun 2021 • Simon Mille, Kaustubh D. Dhole, Saad Mahamood, Laura Perez-Beltrachini, Varun Gangal, Mihir Kale, Emiel van Miltenburg, Sebastian Gehrmann

By applying this framework to the GEM generation benchmark, we propose an evaluation suite made of 80 challenge sets, demonstrate the kinds of analyses that it enables and shed light onto the limits of current generation models.

Text Generation

Paper
Add Code

Preregistering NLP Research

no code implementations • NAACL 2021 • Emiel van Miltenburg, Chris van der Lee, Emiel Krahmer

Preregistration refers to the practice of specifying what you are going to do, and what you expect to find in your study, before carrying out the study.

Paper
Add Code

The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics

no code implementations • ACL (GEM) 2021 • Sebastian Gehrmann, Tosin Adewumi, Karmanya Aggarwal, Pawan Sasanka Ammanamanchi, Aremu Anuoluwapo, Antoine Bosselut, Khyathi Raghavi Chandu, Miruna Clinciu, Dipanjan Das, Kaustubh D. Dhole, Wanyu Du, Esin Durmus, Ondřej Dušek, Chris Emezue, Varun Gangal, Cristina Garbacea, Tatsunori Hashimoto, Yufang Hou, Yacine Jernite, Harsh Jhamtani, Yangfeng Ji, Shailza Jolly, Mihir Kale, Dhruv Kumar, Faisal Ladhak, Aman Madaan, Mounica Maddela, Khyati Mahajan, Saad Mahamood, Bodhisattwa Prasad Majumder, Pedro Henrique Martins, Angelina McMillan-Major, Simon Mille, Emiel van Miltenburg, Moin Nadeem, Shashi Narayan, Vitaly Nikolaev, Rubungo Andre Niyongabo, Salomey Osei, Ankur Parikh, Laura Perez-Beltrachini, Niranjan Ramesh Rao, Vikas Raunak, Juan Diego Rodriguez, Sashank Santhanam, João Sedoc, Thibault Sellam, Samira Shaikh, Anastasia Shimorina, Marco Antonio Sobrevilla Cabezudo, Hendrik Strobelt, Nishant Subramani, Wei Xu, Diyi Yang, Akhila Yerukola, Jiawei Zhou

We introduce GEM, a living benchmark for natural language Generation (NLG), its Evaluation, and Metrics.

Ranked #1 on Extreme Summarization on GEM-XSum

Abstractive Text Summarization Cross-Lingual Abstractive Summarization +5

Paper
Add Code

On the use of human reference data for evaluating automatic image descriptions

no code implementations • 15 Jun 2020 • Emiel van Miltenburg

Automatic image description systems are commonly trained and evaluated using crowdsourced, human-generated image descriptions.

Paper
Add Code

On task effects in NLG corpus elicitation: a replication study using mixed effects modeling

no code implementations • WS 2019 • Emiel van Miltenburg, Merel van de Kerkhof, Ruud Koolen, Martijn Goudbeek, Emiel Krahmer

Task effects in NLG corpus elicitation recently started to receive more attention, but are usually not modeled statistically.

Paper
Add Code

Best practices for the human evaluation of automatically generated text

no code implementations • WS 2019 • Chris van der Lee, Albert Gatt, Emiel van Miltenburg, S Wubben, er, Emiel Krahmer

Currently, there is little agreement as to how Natural Language Generation (NLG) systems should be evaluated.

Text Generation

Paper
Add Code

Neural data-to-text generation: A comparison between pipeline and end-to-end architectures

1 code implementation • IJCNLP 2019 • Thiago Castro Ferreira, Chris van der Lee, Emiel van Miltenburg, Emiel Krahmer

In contrast, recent neural models for data-to-text generation have been proposed as end-to-end approaches, where the non-linguistic input is rendered in natural language with much less explicit intermediate representations in-between.

Ranked #8 on Data-to-Text Generation on WebNLG Full

Data-to-Text Generation

Paper
Code

Talking about other people: an endless range of possibilities

1 code implementation • WS 2018 • Emiel van Miltenburg, Desmond Elliott, Piek Vossen

This taxonomy serves as a reference point to think about how other people should be described, and can be used to classify and compute statistics about labels applied to people.

Text Generation

Paper
Code

Measuring the Diversity of Automatic Image Descriptions

1 code implementation • COLING 2018 • Emiel van Miltenburg, Desmond Elliott, Piek Vossen

Automatic image description systems typically produce generic sentences that only make use of a small subset of the vocabulary available to them.

Text Generation

Paper
Code

Varying image description tasks: spoken versus written descriptions

1 code implementation • COLING 2018 • Emiel van Miltenburg, Ruud Koolen, Emiel Krahmer

Automatic image description systems are commonly trained and evaluated on written image descriptions.

Paper
Code

DIDEC: The Dutch Image Description and Eye-tracking Corpus

no code implementations • COLING 2018 • Emiel van Miltenburg, {\'A}kos K{\'a}d{\'a}r, Ruud Koolen, Emiel Krahmer

We present a corpus of spoken Dutch image descriptions, paired with two sets of eye-tracking data: Free viewing, where participants look at images without any particular purpose, and Description viewing, where we track eye movements while participants produce spoken descriptions of the images they are viewing.

Specificity Task 2

Paper
Add Code

Cross-linguistic differences and similarities in image descriptions

1 code implementation • WS 2017 • Emiel van Miltenburg, Desmond Elliott, Piek Vossen

Automatic image description systems are commonly trained and evaluated on large image description datasets.

Specificity

Paper
Code

Room for improvement in automatic image description: an error analysis

1 code implementation • 13 Apr 2017 • Emiel van Miltenburg, Desmond Elliott

In recent years we have seen rapid and significant progress in automatic image description but what are the open problems in this area?

Paper
Code

Pragmatic descriptions of perceptual stimuli

no code implementations • EACL 2017 • Emiel van Miltenburg

This research proposal discusses pragmatic factors in image description, arguing that current automatic image description systems do not take these factors into account.

Object Recognition

Paper
Add Code

Building a Dictionary of Affixal Negations

1 code implementation • WS 2016 • Chantal van Son, Emiel van Miltenburg, Roser Morante

This paper discusses the need for a dictionary of affixal negations and regular antonyms to facilitate their automatic detection in text.

Natural Language Inference Negation +1

Paper
Code

Pragmatic factors in image description: the case of negations

1 code implementation • WS 2016 • Emiel van Miltenburg, Roser Morante, Desmond Elliott

We provide a qualitative analysis of the descriptions containing negations (no, not, n't, nobody, etc) in the Flickr30K corpus, and a categorization of negation uses.

Negation

Paper
Code

Stereotyping and Bias in the Flickr30K Dataset

2 code implementations • 19 May 2016 • Emiel van Miltenburg

An untested assumption behind the crowdsourced descriptions of the images in the Flickr30K dataset (Young et al., 2014) is that they "focus only on the information that can be obtained from the image alone" (Hodosh et al., 2013, p. 859).

Paper
Code

The VU Sound Corpus: Adding More Fine-grained Annotations to the Freesound Database

no code implementations • LREC 2016 • Emiel van Miltenburg, Benjamin Timmermans, Lora Aroyo

The main goal of this study is to find out (i) whether it is feasible to collect keywords for a large collection of sounds through crowdsourcing, and (ii) how people talk about sounds, and what information they can infer from hearing a sound in isolation.

Paper
Add Code

Detecting and ordering adjectival scalemates

no code implementations • 30 Apr 2015 • Emiel van Miltenburg

This paper presents a pattern-based method that can be used to infer adjectival scales, such as <lukewarm, warm, hot>, from a corpus.

Paper
Add Code

Sound-based distributional models

1 code implementation • WS 2015 • Aless Lopopolo, ro, Emiel van Miltenburg

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.