Search Results for author: Ondřej Dušek

Found 50 papers, 33 papers with code

SpeedySpeech: Efficient Neural Speech Synthesis

3 code implementations • 9 Aug 2020 • Jan Vainer, Ondřej Dušek

While recent neural sequence-to-sequence models have greatly improved the quality of speech synthesis, there has not been a system capable of fast training, fast inference and high-quality audio synthesis at the same time.

Audio Synthesis Speech Synthesis

28,889

Paper
Code

One Model, Many Languages: Meta-learning for Multilingual Text-to-Speech

1 code implementation • 3 Aug 2020 • Tomáš Nekvinda, Ondřej Dušek

We introduce an approach to multilingual speech synthesis which uses the meta-learning concept of contextual parameter generation and produces natural-sounding multilingual speech using more languages and less training data than previous approaches.

Meta-Learning Speech Synthesis +1

805

Paper
Code

The E2E Dataset: New Challenges For End-to-End Generation

2 code implementations • WS 2017 • Jekaterina Novikova, Ondřej Dušek, Verena Rieser

This paper describes the E2E data, a new dataset for training end-to-end, data-driven natural language generation systems in the restaurant domain, which is ten times bigger than existing, frequently used datasets in this area.

Data-to-Text Generation

204

Paper
Code

A Context-aware Natural Language Generator for Dialogue Systems

1 code implementation • 25 Aug 2016 • Ondřej Dušek, Filip Jurčíček

We present a novel natural language generation system for spoken dialogue systems capable of entraining (adapting) to users' way of speaking, providing contextually appropriate responses.

Spoken Dialogue Systems Text Generation

204

Paper
Code

Sequence-to-Sequence Generation for Spoken Dialogue via Deep Syntax Trees and Strings

1 code implementation • 17 Jun 2016 • Ondřej Dušek, Filip Jurčíček

We present a natural language generator based on the sequence-to-sequence approach that can be trained to produce natural language strings as well as deep syntax dependency trees from input dialogue acts, and we use it to directly compare two-step generation with separate sentence planning and surface realization stages to a joint, one-step approach.

Sentence

204

Paper
Code

Findings of the E2E NLG Challenge

1 code implementation • WS 2018 • Ondřej Dušek, Jekaterina Novikova, Verena Rieser

This paper summarises the experimental setup and results of the first shared task on end-to-end (E2E) natural language generation (NLG) in spoken dialogue systems.

Ranked #5 on Data-to-Text Generation on E2E NLG Challenge

Data-to-Text Generation Spoken Dialogue Systems

204

Paper
Code

Neural Generation for Czech: Data and Baselines

2 code implementations • 11 Oct 2019 • Ondřej Dušek, Filip Jurčíček

We present the first dataset targeted at end-to-end NLG in Czech in the restaurant domain, along with several strong baseline models using the sequence-to-sequence approach.

Language Modelling

204

Paper
Code

Three Ways of Using Large Language Models to Evaluate Chat

2 code implementations • 12 Aug 2023 • Ondřej Plátek, Vojtěch Hudeček, Patricia Schmidtová, Mateusz Lango, Ondřej Dušek

This paper describes the systems submitted by team6 for ChatEval, the DSTC 11 Track 4 competition.

Chatbot

143

Paper
Code

Better Conversations by Modeling,Filtering,and Optimizing for Coherence and Diversity

2 code implementations • 18 Sep 2018 • Xinnuo Xu, Ondřej Dušek, Ioannis Konstas, Verena Rieser

We present three enhancements to existing encoder-decoder models for open-domain conversational agents, aimed at effectively modeling coherence and promoting output diversity: (1) We introduce a measure of coherence as the GloVe embedding similarity between the dialogue context and the generated response, (2) we filter our training corpora based on the measure of coherence to obtain topically coherent and lexically diverse context-response pairs, (3) we then train a response generator using a conditional variational autoencoder model that incorporates the measure of coherence as a latent variable and uses a context gate to guarantee topical consistency with the context and promote lexical diversity.

Paper
Code

Shades of BLEU, Flavours of Success: The Case of MultiWOZ

1 code implementation • ACL (GEM) 2021 • Tomáš Nekvinda, Ondřej Dušek

The MultiWOZ dataset (Budzianowski et al., 2018) is frequently used for benchmarking context-to-response abilities of task-oriented dialogue systems.

Benchmarking Task-Oriented Dialogue Systems

Paper
Code

TabGenie: A Toolkit for Table-to-Text Generation

1 code implementation • 27 Feb 2023 • Zdeněk Kasner, Ekaterina Garanina, Ondřej Plátek, Ondřej Dušek

We present TabGenie - a toolkit which enables researchers to explore, preprocess, and analyze a variety of data-to-text generation datasets through the unified framework of table-to-text generation.

Data-to-Text Generation Table-to-Text Generation

Paper
Code

RankME: Reliable Human Ratings for Natural Language Generation

1 code implementation • NAACL 2018 • Jekaterina Novikova, Ondřej Dušek, Verena Rieser

Human evaluation for natural language generation (NLG) often suffers from inconsistent user ratings.

Attribute Experimental Design +1

Paper
Code

Semantic Noise Matters for Neural Natural Language Generation

1 code implementation • WS 2019 • Ondřej Dušek, David M. Howcroft, Verena Rieser

Neural natural language generation (NNLG) systems are known for their pathological outputs, i. e. generating text which is unrelated to the input specification.

Ranked #3 on Data-to-Text Generation on Cleaned E2E NLG Challenge

Data-to-Text Generation Hallucination

Paper
Code

Why We Need New Evaluation Metrics for NLG

1 code implementation • EMNLP 2017 • Jekaterina Novikova, Ondřej Dušek, Amanda Cercas Curry, Verena Rieser

The majority of NLG evaluation relies on automatic metrics, such as BLEU .

nlg evaluation

Paper
Code

AuGPT: Auxiliary Tasks and Data Augmentation for End-To-End Dialogue with Pre-Trained Language Models

1 code implementation • EMNLP (NLP4ConvAI) 2021 • Jonáš Kulhánek, Vojtěch Hudeček, Tomáš Nekvinda, Ondřej Dušek

Our model substantially outperforms the baseline on the MultiWOZ data and shows competitive performance with state of the art in both automatic and human evaluation.

Ranked #3 on End-To-End Dialogue Modelling on MULTIWOZ 2.0 (using extra training data)

End-To-End Dialogue Modelling Translation

Paper
Code

Neural Pipeline for Zero-Shot Data-to-Text Generation

1 code implementation • ACL 2022 • Zdeněk Kasner, Ondřej Dušek

In data-to-text (D2T) generation, training on in-domain data leads to overfitting to the data representation and repeating training data noise.

Data-to-Text Generation

Paper
Code

Data-to-Text Generation with Iterative Text Editing

1 code implementation • INLG (ACL) 2020 • Zdeněk Kasner, Ondřej Dušek

Our approach maximizes the completeness and semantic accuracy of the output text while leveraging the abilities of recent pre-trained models for text editing (LaserTagger) and language modeling (GPT-2) to improve the text fluency.

Data-to-Text Generation Domain Adaptation +3

Paper
Code

Neural Response Ranking for Social Conversation: A Data-Efficient Approach

1 code implementation • WS 2018 • Igor Shalyminov, Ondřej Dušek, Oliver Lemon

Using a dataset of real conversations collected in the 2017 Alexa Prize challenge, we developed a neural ranker for selecting 'good' system responses to user utterances, i. e. responses which are likely to lead to long and engaging conversations.

Paper
Code

AARGH! End-to-end Retrieval-Generation for Task-Oriented Dialog

1 code implementation • SIGDIAL (ACL) 2022 • Tomáš Nekvinda, Ondřej Dušek

We introduce AARGH, an end-to-end task-oriented dialog system combining retrieval and generative approaches in a single model, aiming at improving dialog management and lexical diversity of outputs.

Management Response Generation +1

Paper
Code

Referenceless Quality Estimation for Natural Language Generation

1 code implementation • 5 Aug 2017 • Ondřej Dušek, Jekaterina Novikova, Verena Rieser

Traditional automatic evaluation measures for natural language generation (NLG) use costly human-authored references to estimate the quality of a system output.

Text Generation

Paper
Code

Automatic Quality Estimation for Natural Language Generation: Ranting (Jointly Rating and Ranking)

1 code implementation • WS 2019 • Ondřej Dušek, Karin Sevegnani, Ioannis Konstas, Verena Rieser

We present a recurrent neural network based system for automatic quality estimation of natural language generation (NLG) outputs, which jointly learns to assign numerical ratings to individual outputs and to provide pairwise rankings of two different outputs.

Learning-To-Rank Text Generation

Paper
Code

Evaluating Semantic Accuracy of Data-to-Text Generation with Natural Language Inference

1 code implementation • INLG (ACL) 2020 • Ondřej Dušek, Zdeněk Kasner

A major challenge in evaluating data-to-text (D2T) generation is measuring the semantic accuracy of the generated text, i. e. checking if the output text contains all and only facts supported by the input data.

Data-to-Text Generation Natural Language Inference

Paper
Code

MooseNet: A Trainable Metric for Synthesized Speech with a PLDA Module

1 code implementation • 17 Jan 2023 • Ondřej Plátek, Ondřej Dušek

We present MooseNet, a trainable speech metric that predicts the listeners' Mean Opinion Score (MOS).

Self-Supervised Learning

Paper
Code

Critic-Driven Decoding for Mitigating Hallucinations in Data-to-text Generation

1 code implementation • 25 Oct 2023 • Mateusz Lango, Ondřej Dušek

Our method does not need any changes to the underlying LM's architecture or training procedure and can thus be combined with any model and decoding operating on word probabilities.

Data-to-Text Generation Hallucination +1

Paper
Code

AGGGEN: Ordering and Aggregating while Generating

1 code implementation • ACL 2021 • Xinnuo Xu, Ondřej Dušek, Verena Rieser, Ioannis Konstas

We present AGGGEN (pronounced 'again'), a data-to-text model which re-introduces two explicit sentence planning stages into neural data-to-text systems: input ordering and input aggregation.

Sentence

Paper
Code

Mind the Labels: Describing Relations in Knowledge Graphs With Pretrained Models

1 code implementation • 13 Oct 2022 • Zdeněk Kasner, Ioannis Konstas, Ondřej Dušek

Pretrained language models (PLMs) for data-to-text (D2T) generation can use human-readable data labels such as column headings, keys, or relation names to generalize to out-of-domain examples.

Knowledge Graphs Relation

Paper
Code

Tackling Hallucinations in Neural Chart Summarization

1 code implementation • 1 Aug 2023 • Saad Obaid ul Islam, Iza Škrjanec, Ondřej Dušek, Vera Demberg

Hallucinations in text generation occur when the system produces text that is not grounded in the input.

Natural Language Inference Text Generation

Paper
Code

Text-in-Context: Token-Level Error Detection for Table-to-Text Generation

1 code implementation • INLG (ACL) 2021 • Zdeněk Kasner, Simon Mille, Ondřej Dušek

Our system can detect the errors automatically using a combination of a rule-based natural language generation (NLG) system and pretrained language models (LMs).

Language Modelling Semantic Similarity +3

Paper
Code

A Unifying View On Task-oriented Dialogue Annotation

1 code implementation • LREC 2022 • Vojtěch Hudeček, Leon-paul Schaub, Daniel Stancl, Patrick Paroubek, Ondřej Dušek

In this paper, we present a new dataset, obtained by merging four publicly available annotated corpora for task-oriented dialogues in several domains (MultiWOZ 2. 2, CamRest676, DSTC2 and Schema-Guided Dialogue Dataset).

Dialogue Generation Dialogue State Tracking +1

Paper
Code

An Ensemble Model with Ranking for Social Dialogue

no code implementations • 20 Dec 2017 • Ioannis Papaioannou, Amanda Cercas Curry, Jose L. Part, Igor Shalyminov, Xinnuo Xu, Yanchao Yu, Ondřej Dušek, Verena Rieser, Oliver Lemon

Open-domain social dialogue is one of the long-standing goals of Artificial Intelligence.

Paper
Add Code

Data-driven Natural Language Generation: Paving the Road to Success

no code implementations • 28 Jun 2017 • Jekaterina Novikova, Ondřej Dušek, Verena Rieser

We argue that there are currently two major bottlenecks to the commercial use of statistical machine learning approaches for natural language generation (NLG): (a) The lack of reliable automatic evaluation metrics for NLG, and (b) The scarcity of high quality in-domain corpora.

BIG-bench Machine Learning Text Generation

Paper
Add Code

Evaluating the State-of-the-Art of End-to-End Natural Language Generation: The E2E NLG Challenge

no code implementations • 23 Jan 2019 • Ondřej Dušek, Jekaterina Novikova, Verena Rieser

Introducing novel automatic and human metrics, we compare 62 systems submitted by 17 institutions, covering a wide range of approaches, including machine learning architectures -- with the majority implementing sequence-to-sequence models (seq2seq) -- as well as systems based on grammatical rules and templates.

Text Generation

Paper
Add Code

User Evaluation of a Multi-dimensional Statistical Dialogue System

1 code implementation • WS 2019 • Simon Keizer, Ondřej Dušek, Xingkun Liu, Verena Rieser

We present the first complete spoken dialogue system driven by a multi-dimensional statistical dialogue manager.

Paper
Code

THEaiTRE: Artificial Intelligence to Write a Theatre Play

no code implementations • 25 Jun 2020 • Rudolf Rosa, Ondřej Dušek, Tom Kocmi, David Mareček, Tomáš Musil, Patrícia Schmidtová, Dominik Jurko, Ondřej Bojar, Daniel Hrbek, David Košťák, Martina Kinská, Josef Doležal, Klára Vosecká

We present THEaiTRE, a starting project aimed at automatic generation of theatre play scripts.

Machine Translation Translation

Paper
Add Code

The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics

no code implementations • ACL (GEM) 2021 • Sebastian Gehrmann, Tosin Adewumi, Karmanya Aggarwal, Pawan Sasanka Ammanamanchi, Aremu Anuoluwapo, Antoine Bosselut, Khyathi Raghavi Chandu, Miruna Clinciu, Dipanjan Das, Kaustubh D. Dhole, Wanyu Du, Esin Durmus, Ondřej Dušek, Chris Emezue, Varun Gangal, Cristina Garbacea, Tatsunori Hashimoto, Yufang Hou, Yacine Jernite, Harsh Jhamtani, Yangfeng Ji, Shailza Jolly, Mihir Kale, Dhruv Kumar, Faisal Ladhak, Aman Madaan, Mounica Maddela, Khyati Mahajan, Saad Mahamood, Bodhisattwa Prasad Majumder, Pedro Henrique Martins, Angelina McMillan-Major, Simon Mille, Emiel van Miltenburg, Moin Nadeem, Shashi Narayan, Vitaly Nikolaev, Rubungo Andre Niyongabo, Salomey Osei, Ankur Parikh, Laura Perez-Beltrachini, Niranjan Ramesh Rao, Vikas Raunak, Juan Diego Rodriguez, Sashank Santhanam, João Sedoc, Thibault Sellam, Samira Shaikh, Anastasia Shimorina, Marco Antonio Sobrevilla Cabezudo, Hendrik Strobelt, Nishant Subramani, Wei Xu, Diyi Yang, Akhila Yerukola, Jiawei Zhou

We introduce GEM, a living benchmark for natural language Generation (NLG), its Evaluation, and Metrics.

Ranked #1 on Extreme Summarization on GEM-XSum

Abstractive Text Summarization Cross-Lingual Abstractive Summarization +5

Paper
Add Code

THEaiTRE 1.0: Interactive generation of theatre play scripts

no code implementations • 17 Feb 2021 • Rudolf Rosa, Tomáš Musil, Ondřej Dušek, Dominik Jurko, Patrícia Schmidtová, David Mareček, Ondřej Bojar, Tom Kocmi, Daniel Hrbek, David Košťák, Martina Kinská, Marie Nováková, Josef Doležal, Klára Vosecká, Tomáš Studeník, Petr Žabka

We present the first version of a system for interactive generation of theatre play scripts.

Paper
Add Code

Underreporting of errors in NLG output, and what to do about it

no code implementations • INLG (ACL) 2021 • Emiel van Miltenburg, Miruna-Adriana Clinciu, Ondřej Dušek, Dimitra Gkatzia, Stephanie Inglis, Leo Leppänen, Saad Mahamood, Emma Manning, Stephanie Schoch, Craig Thomson, Luou Wen

We observe a severe under-reporting of the different kinds of errors that Natural Language Generation systems make.

Position Text Generation

Paper
Add Code

MiRANews: Dataset and Benchmarks for Multi-Resource-Assisted News Summarization

1 code implementation • Findings (EMNLP) 2021 • Xinnuo Xu, Ondřej Dušek, Shashi Narayan, Verena Rieser, Ioannis Konstas

We show via data analysis that it's not only the models which are to blame: more than 27% of facts mentioned in the gold summaries of MiRANews are better grounded on assisting documents than in the main source articles.

Document Summarization Multi-Document Summarization +2

Paper
Code

Train Hard, Finetune Easy: Multilingual Denoising for RDF-to-Text Generation

no code implementations • ACL (WebNLG, INLG) 2020 • Zdeněk Kasner, Ondřej Dušek

We describe our system for the RDF-to-text generation task of the WebNLG Challenge 2020.

Denoising Text Generation

Paper
Add Code

DialogueScript: Using Dialogue Agents to Produce a Script

no code implementations • 16 Jun 2022 • Patrícia Schmidtová, Dávid Javorský, Christián Mikláš, Tomáš Musil, Rudolf Rosa, Ondřej Dušek

We present a novel approach to generating scripts by using agents with different personality types.

Natural Language Inference

Paper
Add Code

GPT-2-based Human-in-the-loop Theatre Play Script Generation

no code implementations • NAACL (WNU) 2022 • Rudolf Rosa, Patrícia Schmidtová, Ondřej Dušek, Tomáš Musil, David Mareček, Saad Obaid, Marie Nováková, Klára Vosecká, Josef Doležal

We experiment with adapting generative language models for the generation of long coherent narratives in the form of theatre plays.

Language Modelling Text Generation

Paper
Add Code

Learning Interpretable Latent Dialogue Actions With Less Supervision

1 code implementation • 22 Sep 2022 • Vojtěch Hudeček, Ondřej Dušek

We present a novel architecture for explainable modeling of task-oriented dialogues with discrete latent variables to represent dialogue actions.

Paper
Code

Are LLMs All You Need for Task-Oriented Dialogue?

no code implementations • 13 Apr 2023 • Vojtěch Hudeček, Ondřej Dušek

Instructions-tuned Large Language Models (LLMs) gained recently huge popularity thanks to their ability to interact with users through conversation.

Paper
Add Code

Missing Information, Unresponsive Authors, Experimental Flaws: The Impossibility of Assessing the Reproducibility of Previous Human Evaluations in NLP

no code implementations • 2 May 2023 • Anya Belz, Craig Thomson, Ehud Reiter, Gavin Abercrombie, Jose M. Alonso-Moral, Mohammad Arvan, Anouck Braggaar, Mark Cieliebak, Elizabeth Clark, Kees Van Deemter, Tanvi Dinkar, Ondřej Dušek, Steffen Eger, Qixiang Fang, Mingqi Gao, Albert Gatt, Dimitra Gkatzia, Javier González-Corbelle, Dirk Hovy, Manuela Hürlimann, Takumi Ito, John D. Kelleher, Filip Klubicka, Emiel Krahmer, Huiyuan Lai, Chris van der Lee, Yiru Li, Saad Mahamood, Margot Mieskes, Emiel van Miltenburg, Pablo Mosteiro, Malvina Nissim, Natalie Parde, Ondřej Plátek, Verena Rieser, Jie Ruan, Joel Tetreault, Antonio Toral, Xiaojun Wan, Leo Wanner, Lewis Watson, Diyi Yang

We report our efforts in identifying a set of previous human evaluations in NLP that would be suitable for a coordinated study examining what makes human evaluations in NLP more/less reproducible.

Paper
Add Code

With a Little Help from the Authors: Reproducing Human Evaluation of an MT Error Detector

no code implementations • 12 Aug 2023 • Ondřej Plátek, Mateusz Lango, Ondřej Dušek

This work presents our efforts to reproduce the results of the human evaluation experiment presented in the paper of Vamvas and Sennrich (2022), which evaluated an automatic system detecting over- and undertranslations (translations containing more or less information than the original) in machine translation (MT) outputs.

Machine Translation Translation

Paper
Add Code

LEEETs-Dial: Linguistic Entrainment in End-to-End Task-oriented Dialogue systems

no code implementations • 15 Nov 2023 • Nalin Kumar, Ondřej Dušek

Linguistic entrainment, or alignment, represents a phenomenon where linguistic patterns employed by conversational participants converge to one another.

Task-Oriented Dialogue Systems

Paper
Add Code

Balancing the Style-Content Trade-Off in Sentiment Transfer Using Polarity-Aware Denoising

1 code implementation • 22 Dec 2023 • Sourabrata Mukherjee, Zdeněk Kasner, Ondřej Dušek

Text sentiment transfer aims to flip the sentiment polarity of a sentence (positive to negative or vice versa) while preserving its sentiment-independent content.

Denoising Representation Learning +2

Paper
Code

Beyond Reference-Based Metrics: Analyzing Behaviors of Open LLMs on Data-to-Text Generation

no code implementations • 18 Jan 2024 • Zdeněk Kasner, Ondřej Dušek

We analyze the behaviors of open large language models (LLMs) on the task of data-to-text (D2T) generation, i. e., generating coherent and relevant text from structured data.

Data-to-Text Generation In-Context Learning

Paper
Add Code

Leak, Cheat, Repeat: Data Contamination and Evaluation Malpractices in Closed-Source LLMs

no code implementations • 6 Feb 2024 • Simone Balloccu, Patrícia Schmidtová, Mateusz Lango, Ondřej Dušek

Natural Language Processing (NLP) research is increasingly focusing on the use of Large Language Models (LLMs), with some of the most popular ones being either fully or partially closed-source.

Paper
Add Code

Text Detoxification as Style Transfer in English and Hindi

no code implementations • 12 Feb 2024 • Sourabrata Mukherjee, Akanksha Bansal, Atul Kr. Ojha, John P. McCrae, Ondřej Dušek

This task contributes to safer and more respectful online communication and can be considered a Text Style Transfer (TST) task, where the text style changes while its content is preserved.

Multi-Task Learning Sentence +2

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.