Search Results for author: Emiel van Miltenburg

Found 31 papers, 12 papers with code

Gradations of Error Severity in Automatic Image Descriptions

no code implementations INLG (ACL) 2020 Emiel van Miltenburg, Wei-Ting Lu, Emiel Krahmer, Albert Gatt, Guanyi Chen, Lin Li, Kees Van Deemter

Because our manipulated descriptions form minimal pairs with the reference descriptions, we are able to assess the impact of different kinds of errors on the perceived quality of the descriptions.

Twenty Years of Confusion in Human Evaluation: NLG Needs Evaluation Sheets and Standardised Definitions

no code implementations INLG (ACL) 2020 David M. Howcroft, Anya Belz, Miruna-Adriana Clinciu, Dimitra Gkatzia, Sadid A. Hasan, Saad Mahamood, Simon Mille, Emiel van Miltenburg, Sashank Santhanam, Verena Rieser

Human assessment remains the most trusted form of evaluation in NLG, but highly diverse approaches and a proliferation of different quality criteria used by researchers make it difficult to compare results and draw conclusions across papers, with adverse implications for meta-evaluation and reproducibility.

Experimental Design

How Do Image Description Systems Describe People? A Targeted Assessment of System Competence in the PEOPLE-domain

1 code implementation LANTERN (COLING) 2020 Emiel van Miltenburg

While useful, these evaluations do not tell us anything about the kinds of image descriptions that systems are able to produce.

Open Dutch WordNet

1 code implementation GWC 2016 Marten Postma, Emiel van Miltenburg, Roxane Segers, Anneleen Schoen, Piek Vossen

We describe Open Dutch WordNet, which has been derived from the Cornetto database, the Princeton WordNet and open source resources.

WordNet-based similarity metrics for adjectives

no code implementations GWC 2016 Emiel van Miltenburg

Le and Fokkens (2015) recently showed that taxonomy-based approaches are more reliable than corpus-based approaches in estimating human similarity ratings.

Evaluating Task-oriented Dialogue Systems: A Systematic Review of Measures, Constructs and their Operationalisations

no code implementations21 Dec 2023 Anouck Braggaar, Christine Liebrecht, Emiel van Miltenburg, Emiel Krahmer

This review gives an extensive overview of evaluation methods for task-oriented dialogue systems, paying special attention to practical applications of dialogue systems, for example for customer service.

Task-Oriented Dialogue Systems

Evaluating NLG systems: A brief introduction

no code implementations29 Mar 2023 Emiel van Miltenburg

This year the International Conference on Natural Language Generation (INLG) will feature an award for the paper with the best evaluation.

Text Generation

Implicit causality in GPT-2: a case study

no code implementations8 Dec 2022 Hien Huynh, Tomas O. Lentz, Emiel van Miltenburg

This case study investigates the extent to which a language model (GPT-2) is able to capture native speakers' intuitions about implicit causality in a sentence completion task.

Language Modelling Object +2

Automatic Construction of Evaluation Suites for Natural Language Generation Datasets

no code implementations16 Jun 2021 Simon Mille, Kaustubh D. Dhole, Saad Mahamood, Laura Perez-Beltrachini, Varun Gangal, Mihir Kale, Emiel van Miltenburg, Sebastian Gehrmann

By applying this framework to the GEM generation benchmark, we propose an evaluation suite made of 80 challenge sets, demonstrate the kinds of analyses that it enables and shed light onto the limits of current generation models.

Text Generation

Preregistering NLP Research

no code implementations NAACL 2021 Emiel van Miltenburg, Chris van der Lee, Emiel Krahmer

Preregistration refers to the practice of specifying what you are going to do, and what you expect to find in your study, before carrying out the study.

On the use of human reference data for evaluating automatic image descriptions

no code implementations15 Jun 2020 Emiel van Miltenburg

Automatic image description systems are commonly trained and evaluated using crowdsourced, human-generated image descriptions.

Neural data-to-text generation: A comparison between pipeline and end-to-end architectures

1 code implementation IJCNLP 2019 Thiago Castro Ferreira, Chris van der Lee, Emiel van Miltenburg, Emiel Krahmer

In contrast, recent neural models for data-to-text generation have been proposed as end-to-end approaches, where the non-linguistic input is rendered in natural language with much less explicit intermediate representations in-between.

Data-to-Text Generation

Talking about other people: an endless range of possibilities

1 code implementation WS 2018 Emiel van Miltenburg, Desmond Elliott, Piek Vossen

This taxonomy serves as a reference point to think about how other people should be described, and can be used to classify and compute statistics about labels applied to people.

Text Generation

Measuring the Diversity of Automatic Image Descriptions

1 code implementation COLING 2018 Emiel van Miltenburg, Desmond Elliott, Piek Vossen

Automatic image description systems typically produce generic sentences that only make use of a small subset of the vocabulary available to them.

Text Generation

Varying image description tasks: spoken versus written descriptions

1 code implementation COLING 2018 Emiel van Miltenburg, Ruud Koolen, Emiel Krahmer

Automatic image description systems are commonly trained and evaluated on written image descriptions.

DIDEC: The Dutch Image Description and Eye-tracking Corpus

no code implementations COLING 2018 Emiel van Miltenburg, {\'A}kos K{\'a}d{\'a}r, Ruud Koolen, Emiel Krahmer

We present a corpus of spoken Dutch image descriptions, paired with two sets of eye-tracking data: Free viewing, where participants look at images without any particular purpose, and Description viewing, where we track eye movements while participants produce spoken descriptions of the images they are viewing.

Specificity Task 2

Cross-linguistic differences and similarities in image descriptions

1 code implementation WS 2017 Emiel van Miltenburg, Desmond Elliott, Piek Vossen

Automatic image description systems are commonly trained and evaluated on large image description datasets.

Specificity

Room for improvement in automatic image description: an error analysis

1 code implementation13 Apr 2017 Emiel van Miltenburg, Desmond Elliott

In recent years we have seen rapid and significant progress in automatic image description but what are the open problems in this area?

Pragmatic descriptions of perceptual stimuli

no code implementations EACL 2017 Emiel van Miltenburg

This research proposal discusses pragmatic factors in image description, arguing that current automatic image description systems do not take these factors into account.

Object Recognition

Building a Dictionary of Affixal Negations

1 code implementation WS 2016 Chantal van Son, Emiel van Miltenburg, Roser Morante

This paper discusses the need for a dictionary of affixal negations and regular antonyms to facilitate their automatic detection in text.

Natural Language Inference Negation +1

Pragmatic factors in image description: the case of negations

1 code implementation WS 2016 Emiel van Miltenburg, Roser Morante, Desmond Elliott

We provide a qualitative analysis of the descriptions containing negations (no, not, n't, nobody, etc) in the Flickr30K corpus, and a categorization of negation uses.

Negation

Stereotyping and Bias in the Flickr30K Dataset

2 code implementations19 May 2016 Emiel van Miltenburg

An untested assumption behind the crowdsourced descriptions of the images in the Flickr30K dataset (Young et al., 2014) is that they "focus only on the information that can be obtained from the image alone" (Hodosh et al., 2013, p. 859).

The VU Sound Corpus: Adding More Fine-grained Annotations to the Freesound Database

no code implementations LREC 2016 Emiel van Miltenburg, Benjamin Timmermans, Lora Aroyo

The main goal of this study is to find out (i) whether it is feasible to collect keywords for a large collection of sounds through crowdsourcing, and (ii) how people talk about sounds, and what information they can infer from hearing a sound in isolation.

Detecting and ordering adjectival scalemates

no code implementations30 Apr 2015 Emiel van Miltenburg

This paper presents a pattern-based method that can be used to infer adjectival scales, such as <lukewarm, warm, hot>, from a corpus.

Cannot find the paper you are looking for? You can Submit a new open access paper.