1 code implementation • INLG (ACL) 2021 • Zdeněk Kasner, Simon Mille, Ondřej Dušek
Our system can detect the errors automatically using a combination of a rule-based natural language generation (NLG) system and pretrained language models (LMs).
1 code implementation • LREC 2022 • Vojtěch Hudeček, Leon-paul Schaub, Daniel Stancl, Patrick Paroubek, Ondřej Dušek
In this paper, we present a new dataset, obtained by merging four publicly available annotated corpora for task-oriented dialogues in several domains (MultiWOZ 2. 2, CamRest676, DSTC2 and Schema-Guided Dialogue Dataset).
no code implementations • NAACL (WNU) 2022 • Rudolf Rosa, Patrícia Schmidtová, Ondřej Dušek, Tomáš Musil, David Mareček, Saad Obaid, Marie Nováková, Klára Vosecká, Josef Doležal
We experiment with adapting generative language models for the generation of long coherent narratives in the form of theatre plays.
no code implementations • ACL (WebNLG, INLG) 2020 • Zdeněk Kasner, Ondřej Dušek
We describe our system for the RDF-to-text generation task of the WebNLG Challenge 2020.
no code implementations • 2 May 2023 • Anya Belz, Craig Thomson, Ehud Reiter, Gavin Abercrombie, Jose M. Alonso-Moral, Mohammad Arvan, Jackie Cheung, Mark Cieliebak, Elizabeth Clark, Kees Van Deemter, Tanvi Dinkar, Ondřej Dušek, Steffen Eger, Qixiang Fang, Albert Gatt, Dimitra Gkatzia, Javier González-Corbelle, Dirk Hovy, Manuela Hürlimann, Takumi Ito, John D. Kelleher, Filip Klubicka, Huiyuan Lai, Chris van der Lee, Emiel van Miltenburg, Yiru Li, Saad Mahamood, Margot Mieskes, Malvina Nissim, Natalie Parde, Ondřej Plátek, Verena Rieser, Pablo Mosteiro Romero, Joel Tetreault, Antonio Toral, Xiaojun Wan, Leo Wanner, Lewis Watson, Diyi Yang
We report our efforts in identifying a set of previous human evaluations in NLP that would be suitable for a coordinated study examining what makes human evaluations in NLP more/less reproducible.
no code implementations • 13 Apr 2023 • Vojtěch Hudeček, Ondřej Dušek
Instructions-tuned Large Language Models (LLMs) gained recently huge popularity thanks to their ability to interact with users through conversation.
1 code implementation • 27 Feb 2023 • Zdeněk Kasner, Ekaterina Garanina, Ondřej Plátek, Ondřej Dušek
We present TabGenie - a toolkit which enables researchers to explore, preprocess, and analyze a variety of data-to-text generation datasets through the unified framework of table-to-text generation.
no code implementations • 17 Jan 2023 • Ondřej Plátek, Ondřej Dušek
We present two models.
1 code implementation • 13 Oct 2022 • Zdeněk Kasner, Ioannis Konstas, Ondřej Dušek
Pretrained language models (PLMs) for data-to-text (D2T) generation can use human-readable data labels such as column headings, keys, or relation names to generalize to out-of-domain examples.
1 code implementation • 22 Sep 2022 • Vojtěch Hudeček, Ondřej Dušek
We present a novel architecture for explainable modeling of task-oriented dialogues with discrete latent variables to represent dialogue actions.
1 code implementation • SIGDIAL (ACL) 2022 • Tomáš Nekvinda, Ondřej Dušek
We introduce AARGH, an end-to-end task-oriented dialog system combining retrieval and generative approaches in a single model, aiming at improving dialog management and lexical diversity of outputs.
no code implementations • 16 Jun 2022 • Patrícia Schmidtová, Dávid Javorský, Christián Mikláš, Tomáš Musil, Rudolf Rosa, Ondřej Dušek
We present a novel approach to generating scripts by using agents with different personality types.
1 code implementation • ACL 2022 • Zdeněk Kasner, Ondřej Dušek
In data-to-text (D2T) generation, training on in-domain data leads to overfitting to the data representation and repeating training data noise.
1 code implementation • Findings (EMNLP) 2021 • Xinnuo Xu, Ondřej Dušek, Shashi Narayan, Verena Rieser, Ioannis Konstas
We show via data analysis that it's not only the models which are to blame: more than 27% of facts mentioned in the gold summaries of MiRANews are better grounded on assisting documents than in the main source articles.
no code implementations • INLG (ACL) 2021 • Emiel van Miltenburg, Miruna-Adriana Clinciu, Ondřej Dušek, Dimitra Gkatzia, Stephanie Inglis, Leo Leppänen, Saad Mahamood, Emma Manning, Stephanie Schoch, Craig Thomson, Luou Wen
We observe a severe under-reporting of the different kinds of errors that Natural Language Generation systems make.
1 code implementation • ACL (GEM) 2021 • Tomáš Nekvinda, Ondřej Dušek
The MultiWOZ dataset (Budzianowski et al., 2018) is frequently used for benchmarking context-to-response abilities of task-oriented dialogue systems.
1 code implementation • ACL 2021 • Xinnuo Xu, Ondřej Dušek, Verena Rieser, Ioannis Konstas
We present AGGGEN (pronounced 'again'), a data-to-text model which re-introduces two explicit sentence planning stages into neural data-to-text systems: input ordering and input aggregation.
no code implementations • 17 Feb 2021 • Rudolf Rosa, Tomáš Musil, Ondřej Dušek, Dominik Jurko, Patrícia Schmidtová, David Mareček, Ondřej Bojar, Tom Kocmi, Daniel Hrbek, David Košťák, Martina Kinská, Marie Nováková, Josef Doležal, Klára Vosecká, Tomáš Studeník, Petr Žabka
We present the first version of a system for interactive generation of theatre play scripts.
1 code implementation • EMNLP (NLP4ConvAI) 2021 • Jonáš Kulhánek, Vojtěch Hudeček, Tomáš Nekvinda, Ondřej Dušek
Our model substantially outperforms the baseline on the MultiWOZ data and shows competitive performance with state of the art in both automatic and human evaluation.
Ranked #3 on End-To-End Dialogue Modelling on MULTIWOZ 2.0 (using extra training data)
no code implementations • ACL (GEM) 2021 • Sebastian Gehrmann, Tosin Adewumi, Karmanya Aggarwal, Pawan Sasanka Ammanamanchi, Aremu Anuoluwapo, Antoine Bosselut, Khyathi Raghavi Chandu, Miruna Clinciu, Dipanjan Das, Kaustubh D. Dhole, Wanyu Du, Esin Durmus, Ondřej Dušek, Chris Emezue, Varun Gangal, Cristina Garbacea, Tatsunori Hashimoto, Yufang Hou, Yacine Jernite, Harsh Jhamtani, Yangfeng Ji, Shailza Jolly, Mihir Kale, Dhruv Kumar, Faisal Ladhak, Aman Madaan, Mounica Maddela, Khyati Mahajan, Saad Mahamood, Bodhisattwa Prasad Majumder, Pedro Henrique Martins, Angelina McMillan-Major, Simon Mille, Emiel van Miltenburg, Moin Nadeem, Shashi Narayan, Vitaly Nikolaev, Rubungo Andre Niyongabo, Salomey Osei, Ankur Parikh, Laura Perez-Beltrachini, Niranjan Ramesh Rao, Vikas Raunak, Juan Diego Rodriguez, Sashank Santhanam, João Sedoc, Thibault Sellam, Samira Shaikh, Anastasia Shimorina, Marco Antonio Sobrevilla Cabezudo, Hendrik Strobelt, Nishant Subramani, Wei Xu, Diyi Yang, Akhila Yerukola, Jiawei Zhou
We introduce GEM, a living benchmark for natural language Generation (NLG), its Evaluation, and Metrics.
Ranked #1 on Extreme Summarization on GEM-XSum
Abstractive Text Summarization Cross-Lingual Abstractive Summarization +5
1 code implementation • INLG (ACL) 2020 • Ondřej Dušek, Zdeněk Kasner
A major challenge in evaluating data-to-text (D2T) generation is measuring the semantic accuracy of the generated text, i. e. checking if the output text contains all and only facts supported by the input data.
1 code implementation • INLG (ACL) 2020 • Zdeněk Kasner, Ondřej Dušek
Our approach maximizes the completeness and semantic accuracy of the output text while leveraging the abilities of recent pre-trained models for text editing (LaserTagger) and language modeling (GPT-2) to improve the text fluency.
3 code implementations • 9 Aug 2020 • Jan Vainer, Ondřej Dušek
While recent neural sequence-to-sequence models have greatly improved the quality of speech synthesis, there has not been a system capable of fast training, fast inference and high-quality audio synthesis at the same time.
1 code implementation • 3 Aug 2020 • Tomáš Nekvinda, Ondřej Dušek
We introduce an approach to multilingual speech synthesis which uses the meta-learning concept of contextual parameter generation and produces natural-sounding multilingual speech using more languages and less training data than previous approaches.
no code implementations • 25 Jun 2020 • Rudolf Rosa, Ondřej Dušek, Tom Kocmi, David Mareček, Tomáš Musil, Patrícia Schmidtová, Dominik Jurko, Ondřej Bojar, Daniel Hrbek, David Košťák, Martina Kinská, Josef Doležal, Klára Vosecká
We present THEaiTRE, a starting project aimed at automatic generation of theatre play scripts.
1 code implementation • WS 2019 • Ondřej Dušek, David M. Howcroft, Verena Rieser
Neural natural language generation (NNLG) systems are known for their pathological outputs, i. e. generating text which is unrelated to the input specification.
Ranked #3 on Data-to-Text Generation on Cleaned E2E NLG Challenge
2 code implementations • 11 Oct 2019 • Ondřej Dušek, Filip Jurčíček
We present the first dataset targeted at end-to-end NLG in Czech in the restaurant domain, along with several strong baseline models using the sequence-to-sequence approach.
1 code implementation • WS 2019 • Ondřej Dušek, Karin Sevegnani, Ioannis Konstas, Verena Rieser
We present a recurrent neural network based system for automatic quality estimation of natural language generation (NLG) outputs, which jointly learns to assign numerical ratings to individual outputs and to provide pairwise rankings of two different outputs.
1 code implementation • WS 2019 • Simon Keizer, Ondřej Dušek, Xingkun Liu, Verena Rieser
We present the first complete spoken dialogue system driven by a multi-dimensional statistical dialogue manager.
no code implementations • 23 Jan 2019 • Ondřej Dušek, Jekaterina Novikova, Verena Rieser
Introducing novel automatic and human metrics, we compare 62 systems submitted by 17 institutions, covering a wide range of approaches, including machine learning architectures -- with the majority implementing sequence-to-sequence models (seq2seq) -- as well as systems based on grammatical rules and templates.
1 code implementation • WS 2018 • Igor Shalyminov, Ondřej Dušek, Oliver Lemon
Using a dataset of real conversations collected in the 2017 Alexa Prize challenge, we developed a neural ranker for selecting 'good' system responses to user utterances, i. e. responses which are likely to lead to long and engaging conversations.
1 code implementation • WS 2018 • Ondřej Dušek, Jekaterina Novikova, Verena Rieser
This paper summarises the experimental setup and results of the first shared task on end-to-end (E2E) natural language generation (NLG) in spoken dialogue systems.
Ranked #4 on Data-to-Text Generation on E2E NLG Challenge
2 code implementations • 18 Sep 2018 • Xinnuo Xu, Ondřej Dušek, Ioannis Konstas, Verena Rieser
We present three enhancements to existing encoder-decoder models for open-domain conversational agents, aimed at effectively modeling coherence and promoting output diversity: (1) We introduce a measure of coherence as the GloVe embedding similarity between the dialogue context and the generated response, (2) we filter our training corpora based on the measure of coherence to obtain topically coherent and lexically diverse context-response pairs, (3) we then train a response generator using a conditional variational autoencoder model that incorporates the measure of coherence as a latent variable and uses a context gate to guarantee topical consistency with the context and promote lexical diversity.
1 code implementation • NAACL 2018 • Jekaterina Novikova, Ondřej Dušek, Verena Rieser
Human evaluation for natural language generation (NLG) often suffers from inconsistent user ratings.
no code implementations • 20 Dec 2017 • Ioannis Papaioannou, Amanda Cercas Curry, Jose L. Part, Igor Shalyminov, Xinnuo Xu, Yanchao Yu, Ondřej Dušek, Verena Rieser, Oliver Lemon
Open-domain social dialogue is one of the long-standing goals of Artificial Intelligence.
1 code implementation • 5 Aug 2017 • Ondřej Dušek, Jekaterina Novikova, Verena Rieser
Traditional automatic evaluation measures for natural language generation (NLG) use costly human-authored references to estimate the quality of a system output.
1 code implementation • EMNLP 2017 • Jekaterina Novikova, Ondřej Dušek, Amanda Cercas Curry, Verena Rieser
The majority of NLG evaluation relies on automatic metrics, such as BLEU .
2 code implementations • WS 2017 • Jekaterina Novikova, Ondřej Dušek, Verena Rieser
This paper describes the E2E data, a new dataset for training end-to-end, data-driven natural language generation systems in the restaurant domain, which is ten times bigger than existing, frequently used datasets in this area.
no code implementations • 28 Jun 2017 • Jekaterina Novikova, Ondřej Dušek, Verena Rieser
We argue that there are currently two major bottlenecks to the commercial use of statistical machine learning approaches for natural language generation (NLG): (a) The lack of reliable automatic evaluation metrics for NLG, and (b) The scarcity of high quality in-domain corpora.
1 code implementation • 25 Aug 2016 • Ondřej Dušek, Filip Jurčíček
We present a novel natural language generation system for spoken dialogue systems capable of entraining (adapting) to users' way of speaking, providing contextually appropriate responses.
1 code implementation • 17 Jun 2016 • Ondřej Dušek, Filip Jurčíček
We present a natural language generator based on the sequence-to-sequence approach that can be trained to produce natural language strings as well as deep syntax dependency trees from input dialogue acts, and we use it to directly compare two-step generation with separate sentence planning and surface realization stages to a joint, one-step approach.