Search Results for author: Verena Rieser

Found 54 papers, 22 papers with code

Twenty Years of Confusion in Human Evaluation: NLG Needs Evaluation Sheets and Standardised Definitions

no code implementations INLG (ACL) 2020 David M. Howcroft, Anya Belz, Miruna-Adriana Clinciu, Dimitra Gkatzia, Sadid A. Hasan, Saad Mahamood, Simon Mille, Emiel van Miltenburg, Sashank Santhanam, Verena Rieser

Human assessment remains the most trusted form of evaluation in NLG, but highly diverse approaches and a proliferation of different quality criteria used by researchers make it difficult to compare results and draw conclusions across papers, with adverse implications for meta-evaluation and reproducibility.

Experimental Design

ConvAbuse: Data, Analysis, and Benchmarks for Nuanced Detection in Conversational AI

no code implementations EMNLP 2021 Amanda Cercas Curry, Gavin Abercrombie, Verena Rieser

We find that the distribution of abuse is vastly different compared to other commonly used datasets, with more sexually tinted aggression towards the virtual persona of these systems.

Abusive Language Chatbot

Conversational Assistants and Gender Stereotypes: Public Perceptions and Desiderata for Voice Personas

no code implementations GeBNLP (COLING) 2020 Amanda Cercas Curry, Judy Robertson, Verena Rieser

We then outline a multi-disciplinary project of how we plan to address the complex question of gender and stereotyping in digital assistants.

Multiple-choice

MiRANews: Dataset and Benchmarks for Multi-Resource-Assisted News Summarization

1 code implementation Findings (EMNLP) 2021 Xinnuo Xu, Ondřej Dušek, Shashi Narayan, Verena Rieser, Ioannis Konstas

We show via data analysis that it's not only the models which are to blame: more than 27% of facts mentioned in the gold summaries of MiRANews are better grounded on assisting documents than in the main source articles.

Document Summarization Multi-Document Summarization +1

ConvAbuse: Data, Analysis, and Benchmarks for Nuanced Abuse Detection in Conversational AI

1 code implementation20 Sep 2021 Amanda Cercas Curry, Gavin Abercrombie, Verena Rieser

We find that the distribution of abuse is vastly different compared to other commonly used datasets, with more sexually tinted aggression towards the virtual persona of these systems.

Abuse Detection Abusive Language +1

Anticipating Safety Issues in E2E Conversational AI: Framework and Tooling

no code implementations7 Jul 2021 Emily Dinan, Gavin Abercrombie, A. Stevie Bergman, Shannon Spruit, Dirk Hovy, Y-Lan Boureau, Verena Rieser

Over the last several years, end-to-end neural conversational agents have vastly improved in their ability to carry a chit-chat conversation with humans.

AGGGEN: Ordering and Aggregating while Generating

1 code implementation ACL 2021 Xinnuo Xu, Ondřej Dušek, Verena Rieser, Ioannis Konstas

We present AGGGEN (pronounced 'again'), a data-to-text model which re-introduces two explicit sentence planning stages into neural data-to-text systems: input ordering and input aggregation.

SLURP: A Spoken Language Understanding Resource Package

1 code implementation EMNLP 2020 Emanuele Bastianelli, Andrea Vanzo, Pawel Swietojanski, Verena Rieser

Spoken Language Understanding infers semantic meaning directly from audio data, and thus promises to reduce error propagation and misunderstandings in end-user applications.

Spoken Language Understanding

Fact-based Content Weighting for Evaluating Abstractive Summarisation

no code implementations ACL 2020 Xinnuo Xu, Ond{\v{r}}ej Du{\v{s}}ek, Jingyi Li, Verena Rieser, Ioannis Konstas

Abstractive summarisation is notoriously hard to evaluate since standard word-overlap-based metrics are insufficient.

History for Visual Dialog: Do we really need it?

2 code implementations ACL 2020 Shubham Agarwal, Trung Bui, Joon-Young Lee, Ioannis Konstas, Verena Rieser

Visual Dialog involves "understanding" the dialog history (what has been discussed previously) and the current question (what is asked), in addition to grounding information in the image, to generate the correct response.

Visual Dialog

Semantic Noise Matters for Neural Natural Language Generation

1 code implementation WS 2019 Ondřej Dušek, David M. Howcroft, Verena Rieser

Neural natural language generation (NNLG) systems are known for their pathological outputs, i. e. generating text which is unrelated to the input specification.

Data-to-Text Generation

Automatic Quality Estimation for Natural Language Generation: Ranting (Jointly Rating and Ranking)

1 code implementation WS 2019 Ondřej Dušek, Karin Sevegnani, Ioannis Konstas, Verena Rieser

We present a recurrent neural network based system for automatic quality estimation of natural language generation (NLG) outputs, which jointly learns to assign numerical ratings to individual outputs and to provide pairwise rankings of two different outputs.

Learning-To-Rank Text Generation

User Evaluation of a Multi-dimensional Statistical Dialogue System

1 code implementation WS 2019 Simon Keizer, Ondřej Dušek, Xingkun Liu, Verena Rieser

We present the first complete spoken dialogue system driven by a multi-dimensional statistical dialogue manager.

Benchmarking Natural Language Understanding Services for building Conversational Agents

5 code implementations13 Mar 2019 Xingkun Liu, Arash Eshghi, Pawel Swietojanski, Verena Rieser

We have recently seen the emergence of several publicly available Natural Language Understanding (NLU) toolkits, which map user utterances to structured, but more abstract, Dialogue Act (DA) or Intent specifications, while making this process accessible to the lay developer.

General Classification Intent Classification +1

Evaluating the State-of-the-Art of End-to-End Natural Language Generation: The E2E NLG Challenge

no code implementations23 Jan 2019 Ondřej Dušek, Jekaterina Novikova, Verena Rieser

Introducing novel automatic and human metrics, we compare 62 systems submitted by 17 institutions, covering a wide range of approaches, including machine learning architectures -- with the majority implementing sequence-to-sequence models (seq2seq) -- as well as systems based on grammatical rules and templates.

Text Generation

A Knowledge-Grounded Multimodal Search-Based Conversational Agent

1 code implementation WS 2018 Shubham Agarwal, Ondrej Dusek, Ioannis Konstas, Verena Rieser

Multimodal search-based dialogue is a challenging new task: It extends visually grounded question answering systems into multi-turn conversations with access to an external database.

Question Answering Response Generation

Findings of the E2E NLG Challenge

1 code implementation WS 2018 Ondřej Dušek, Jekaterina Novikova, Verena Rieser

This paper summarises the experimental setup and results of the first shared task on end-to-end (E2E) natural language generation (NLG) in spoken dialogue systems.

Data-to-Text Generation Spoken Dialogue Systems

Better Conversations by Modeling, Filtering, and Optimizing for Coherence and Diversity

1 code implementation EMNLP 2018 Xinnuo Xu, Ond{\v{r}}ej Du{\v{s}}ek, Ioannis Konstas, Verena Rieser

We present three enhancements to existing encoder-decoder models for open-domain conversational agents, aimed at effectively modeling coherence and promoting output diversity: (1) We introduce a measure of coherence as the GloVe embedding similarity between the dialogue context and the generated response, (2) we filter our training corpora based on the measure of coherence to obtain topically coherent and lexically diverse context-response pairs, (3) we then train a response generator using a conditional variational autoencoder model that incorporates the measure of coherence as a latent variable and uses a context gate to guarantee topical consistency with the context and promote lexical diversity.

Dialogue Generation

Better Conversations by Modeling,Filtering,and Optimizing for Coherence and Diversity

2 code implementations18 Sep 2018 Xinnuo Xu, Ondřej Dušek, Ioannis Konstas, Verena Rieser

We present three enhancements to existing encoder-decoder models for open-domain conversational agents, aimed at effectively modeling coherence and promoting output diversity: (1) We introduce a measure of coherence as the GloVe embedding similarity between the dialogue context and the generated response, (2) we filter our training corpora based on the measure of coherence to obtain topically coherent and lexically diverse context-response pairs, (3) we then train a response generator using a conditional variational autoencoder model that incorporates the measure of coherence as a latent variable and uses a context gate to guarantee topical consistency with the context and promote lexical diversity.

\#MeToo Alexa: How Conversational Systems Respond to Sexual Harassment

no code implementations WS 2018 Am Cercas Curry, a, Verena Rieser

In this article, we establish how current state-of-the-art conversational systems react to inappropriate requests, such as bullying and sexual harassment on the part of the user, by collecting and analysing the novel {\#}MeTooAlexa corpus.

A Review of Evaluation Techniques for Social Dialogue Systems

no code implementations13 Sep 2017 Amanda Cercas Curry, Helen Hastie, Verena Rieser

In contrast with goal-oriented dialogue, social dialogue has no clear measure of task success.

Referenceless Quality Estimation for Natural Language Generation

1 code implementation5 Aug 2017 Ondřej Dušek, Jekaterina Novikova, Verena Rieser

Traditional automatic evaluation measures for natural language generation (NLG) use costly human-authored references to estimate the quality of a system output.

Text Generation

The E2E Dataset: New Challenges For End-to-End Generation

1 code implementation WS 2017 Jekaterina Novikova, Ondřej Dušek, Verena Rieser

This paper describes the E2E data, a new dataset for training end-to-end, data-driven natural language generation systems in the restaurant domain, which is ten times bigger than existing, frequently used datasets in this area.

Data-to-Text Generation

Data-driven Natural Language Generation: Paving the Road to Success

no code implementations28 Jun 2017 Jekaterina Novikova, Ondřej Dušek, Verena Rieser

We argue that there are currently two major bottlenecks to the commercial use of statistical machine learning approaches for natural language generation (NLG): (a) The lack of reliable automatic evaluation metrics for NLG, and (b) The scarcity of high quality in-domain corpora.

Text Generation

Crowd-sourcing NLG Data: Pictures Elicit Better Data

no code implementations1 Aug 2016 Jekaterina Novikova, Oliver Lemon, Verena Rieser

Recent advances in corpus-based Natural Language Generation (NLG) hold the promise of being easily portable across domains, but require costly training data, consisting of meaning representations (MRs) paired with Natural Language (NL) utterances.

Text Generation

Natural Language Generation as Planning under Uncertainty Using Reinforcement Learning

no code implementations15 Jun 2016 Verena Rieser, Oliver Lemon

We present and evaluate a new model for Natural Language Generation (NLG) in Spoken Dialogue Systems, based on statistical planning, given noisy feedback from the current generation context (e. g. a user and a surface realiser).

reinforcement-learning Spoken Dialogue Systems +1

Cannot find the paper you are looking for? You can Submit a new open access paper.