Search Results for author: Ondřej Dušek

Found 50 papers, 33 papers with code

SpeedySpeech: Efficient Neural Speech Synthesis

3 code implementations9 Aug 2020 Jan Vainer, Ondřej Dušek

While recent neural sequence-to-sequence models have greatly improved the quality of speech synthesis, there has not been a system capable of fast training, fast inference and high-quality audio synthesis at the same time.

Audio Synthesis Speech Synthesis

One Model, Many Languages: Meta-learning for Multilingual Text-to-Speech

1 code implementation3 Aug 2020 Tomáš Nekvinda, Ondřej Dušek

We introduce an approach to multilingual speech synthesis which uses the meta-learning concept of contextual parameter generation and produces natural-sounding multilingual speech using more languages and less training data than previous approaches.

Meta-Learning Speech Synthesis +1

The E2E Dataset: New Challenges For End-to-End Generation

2 code implementations WS 2017 Jekaterina Novikova, Ondřej Dušek, Verena Rieser

This paper describes the E2E data, a new dataset for training end-to-end, data-driven natural language generation systems in the restaurant domain, which is ten times bigger than existing, frequently used datasets in this area.

Data-to-Text Generation

A Context-aware Natural Language Generator for Dialogue Systems

1 code implementation25 Aug 2016 Ondřej Dušek, Filip Jurčíček

We present a novel natural language generation system for spoken dialogue systems capable of entraining (adapting) to users' way of speaking, providing contextually appropriate responses.

Spoken Dialogue Systems Text Generation

Sequence-to-Sequence Generation for Spoken Dialogue via Deep Syntax Trees and Strings

1 code implementation17 Jun 2016 Ondřej Dušek, Filip Jurčíček

We present a natural language generator based on the sequence-to-sequence approach that can be trained to produce natural language strings as well as deep syntax dependency trees from input dialogue acts, and we use it to directly compare two-step generation with separate sentence planning and surface realization stages to a joint, one-step approach.

Sentence

Findings of the E2E NLG Challenge

1 code implementation WS 2018 Ondřej Dušek, Jekaterina Novikova, Verena Rieser

This paper summarises the experimental setup and results of the first shared task on end-to-end (E2E) natural language generation (NLG) in spoken dialogue systems.

Data-to-Text Generation Spoken Dialogue Systems

Neural Generation for Czech: Data and Baselines

2 code implementations11 Oct 2019 Ondřej Dušek, Filip Jurčíček

We present the first dataset targeted at end-to-end NLG in Czech in the restaurant domain, along with several strong baseline models using the sequence-to-sequence approach.

Language Modelling

Three Ways of Using Large Language Models to Evaluate Chat

2 code implementations12 Aug 2023 Ondřej Plátek, Vojtěch Hudeček, Patricia Schmidtová, Mateusz Lango, Ondřej Dušek

This paper describes the systems submitted by team6 for ChatEval, the DSTC 11 Track 4 competition.

Chatbot

Better Conversations by Modeling,Filtering,and Optimizing for Coherence and Diversity

2 code implementations18 Sep 2018 Xinnuo Xu, Ondřej Dušek, Ioannis Konstas, Verena Rieser

We present three enhancements to existing encoder-decoder models for open-domain conversational agents, aimed at effectively modeling coherence and promoting output diversity: (1) We introduce a measure of coherence as the GloVe embedding similarity between the dialogue context and the generated response, (2) we filter our training corpora based on the measure of coherence to obtain topically coherent and lexically diverse context-response pairs, (3) we then train a response generator using a conditional variational autoencoder model that incorporates the measure of coherence as a latent variable and uses a context gate to guarantee topical consistency with the context and promote lexical diversity.

Shades of BLEU, Flavours of Success: The Case of MultiWOZ

1 code implementation ACL (GEM) 2021 Tomáš Nekvinda, Ondřej Dušek

The MultiWOZ dataset (Budzianowski et al., 2018) is frequently used for benchmarking context-to-response abilities of task-oriented dialogue systems.

Benchmarking Task-Oriented Dialogue Systems

TabGenie: A Toolkit for Table-to-Text Generation

1 code implementation27 Feb 2023 Zdeněk Kasner, Ekaterina Garanina, Ondřej Plátek, Ondřej Dušek

We present TabGenie - a toolkit which enables researchers to explore, preprocess, and analyze a variety of data-to-text generation datasets through the unified framework of table-to-text generation.

Data-to-Text Generation Table-to-Text Generation

Semantic Noise Matters for Neural Natural Language Generation

1 code implementation WS 2019 Ondřej Dušek, David M. Howcroft, Verena Rieser

Neural natural language generation (NNLG) systems are known for their pathological outputs, i. e. generating text which is unrelated to the input specification.

Data-to-Text Generation Hallucination

AuGPT: Auxiliary Tasks and Data Augmentation for End-To-End Dialogue with Pre-Trained Language Models

1 code implementation EMNLP (NLP4ConvAI) 2021 Jonáš Kulhánek, Vojtěch Hudeček, Tomáš Nekvinda, Ondřej Dušek

Our model substantially outperforms the baseline on the MultiWOZ data and shows competitive performance with state of the art in both automatic and human evaluation.

Ranked #3 on End-To-End Dialogue Modelling on MULTIWOZ 2.0 (using extra training data)

End-To-End Dialogue Modelling Translation

Neural Pipeline for Zero-Shot Data-to-Text Generation

1 code implementation ACL 2022 Zdeněk Kasner, Ondřej Dušek

In data-to-text (D2T) generation, training on in-domain data leads to overfitting to the data representation and repeating training data noise.

Data-to-Text Generation

Data-to-Text Generation with Iterative Text Editing

1 code implementation INLG (ACL) 2020 Zdeněk Kasner, Ondřej Dušek

Our approach maximizes the completeness and semantic accuracy of the output text while leveraging the abilities of recent pre-trained models for text editing (LaserTagger) and language modeling (GPT-2) to improve the text fluency.

Data-to-Text Generation Domain Adaptation +3

Neural Response Ranking for Social Conversation: A Data-Efficient Approach

1 code implementation WS 2018 Igor Shalyminov, Ondřej Dušek, Oliver Lemon

Using a dataset of real conversations collected in the 2017 Alexa Prize challenge, we developed a neural ranker for selecting 'good' system responses to user utterances, i. e. responses which are likely to lead to long and engaging conversations.

AARGH! End-to-end Retrieval-Generation for Task-Oriented Dialog

1 code implementation SIGDIAL (ACL) 2022 Tomáš Nekvinda, Ondřej Dušek

We introduce AARGH, an end-to-end task-oriented dialog system combining retrieval and generative approaches in a single model, aiming at improving dialog management and lexical diversity of outputs.

Management Response Generation +1

Referenceless Quality Estimation for Natural Language Generation

1 code implementation5 Aug 2017 Ondřej Dušek, Jekaterina Novikova, Verena Rieser

Traditional automatic evaluation measures for natural language generation (NLG) use costly human-authored references to estimate the quality of a system output.

Text Generation

Automatic Quality Estimation for Natural Language Generation: Ranting (Jointly Rating and Ranking)

1 code implementation WS 2019 Ondřej Dušek, Karin Sevegnani, Ioannis Konstas, Verena Rieser

We present a recurrent neural network based system for automatic quality estimation of natural language generation (NLG) outputs, which jointly learns to assign numerical ratings to individual outputs and to provide pairwise rankings of two different outputs.

Learning-To-Rank Text Generation

Evaluating Semantic Accuracy of Data-to-Text Generation with Natural Language Inference

1 code implementation INLG (ACL) 2020 Ondřej Dušek, Zdeněk Kasner

A major challenge in evaluating data-to-text (D2T) generation is measuring the semantic accuracy of the generated text, i. e. checking if the output text contains all and only facts supported by the input data.

Data-to-Text Generation Natural Language Inference

MooseNet: A Trainable Metric for Synthesized Speech with a PLDA Module

1 code implementation17 Jan 2023 Ondřej Plátek, Ondřej Dušek

We present MooseNet, a trainable speech metric that predicts the listeners' Mean Opinion Score (MOS).

Self-Supervised Learning

Critic-Driven Decoding for Mitigating Hallucinations in Data-to-text Generation

1 code implementation25 Oct 2023 Mateusz Lango, Ondřej Dušek

Our method does not need any changes to the underlying LM's architecture or training procedure and can thus be combined with any model and decoding operating on word probabilities.

Data-to-Text Generation Hallucination +1

AGGGEN: Ordering and Aggregating while Generating

1 code implementation ACL 2021 Xinnuo Xu, Ondřej Dušek, Verena Rieser, Ioannis Konstas

We present AGGGEN (pronounced 'again'), a data-to-text model which re-introduces two explicit sentence planning stages into neural data-to-text systems: input ordering and input aggregation.

Sentence

Mind the Labels: Describing Relations in Knowledge Graphs With Pretrained Models

1 code implementation13 Oct 2022 Zdeněk Kasner, Ioannis Konstas, Ondřej Dušek

Pretrained language models (PLMs) for data-to-text (D2T) generation can use human-readable data labels such as column headings, keys, or relation names to generalize to out-of-domain examples.

Knowledge Graphs Relation

Tackling Hallucinations in Neural Chart Summarization

1 code implementation1 Aug 2023 Saad Obaid ul Islam, Iza Škrjanec, Ondřej Dušek, Vera Demberg

Hallucinations in text generation occur when the system produces text that is not grounded in the input.

Natural Language Inference Text Generation

Text-in-Context: Token-Level Error Detection for Table-to-Text Generation

1 code implementation INLG (ACL) 2021 Zdeněk Kasner, Simon Mille, Ondřej Dušek

Our system can detect the errors automatically using a combination of a rule-based natural language generation (NLG) system and pretrained language models (LMs).

Language Modelling Semantic Similarity +3

A Unifying View On Task-oriented Dialogue Annotation

1 code implementation LREC 2022 Vojtěch Hudeček, Leon-paul Schaub, Daniel Stancl, Patrick Paroubek, Ondřej Dušek

In this paper, we present a new dataset, obtained by merging four publicly available annotated corpora for task-oriented dialogues in several domains (MultiWOZ 2. 2, CamRest676, DSTC2 and Schema-Guided Dialogue Dataset).

Dialogue Generation Dialogue State Tracking +1

Data-driven Natural Language Generation: Paving the Road to Success

no code implementations28 Jun 2017 Jekaterina Novikova, Ondřej Dušek, Verena Rieser

We argue that there are currently two major bottlenecks to the commercial use of statistical machine learning approaches for natural language generation (NLG): (a) The lack of reliable automatic evaluation metrics for NLG, and (b) The scarcity of high quality in-domain corpora.

BIG-bench Machine Learning Text Generation

Evaluating the State-of-the-Art of End-to-End Natural Language Generation: The E2E NLG Challenge

no code implementations23 Jan 2019 Ondřej Dušek, Jekaterina Novikova, Verena Rieser

Introducing novel automatic and human metrics, we compare 62 systems submitted by 17 institutions, covering a wide range of approaches, including machine learning architectures -- with the majority implementing sequence-to-sequence models (seq2seq) -- as well as systems based on grammatical rules and templates.

Text Generation

User Evaluation of a Multi-dimensional Statistical Dialogue System

1 code implementation WS 2019 Simon Keizer, Ondřej Dušek, Xingkun Liu, Verena Rieser

We present the first complete spoken dialogue system driven by a multi-dimensional statistical dialogue manager.

MiRANews: Dataset and Benchmarks for Multi-Resource-Assisted News Summarization

1 code implementation Findings (EMNLP) 2021 Xinnuo Xu, Ondřej Dušek, Shashi Narayan, Verena Rieser, Ioannis Konstas

We show via data analysis that it's not only the models which are to blame: more than 27% of facts mentioned in the gold summaries of MiRANews are better grounded on assisting documents than in the main source articles.

Document Summarization Multi-Document Summarization +2

Learning Interpretable Latent Dialogue Actions With Less Supervision

1 code implementation22 Sep 2022 Vojtěch Hudeček, Ondřej Dušek

We present a novel architecture for explainable modeling of task-oriented dialogues with discrete latent variables to represent dialogue actions.

Are LLMs All You Need for Task-Oriented Dialogue?

no code implementations13 Apr 2023 Vojtěch Hudeček, Ondřej Dušek

Instructions-tuned Large Language Models (LLMs) gained recently huge popularity thanks to their ability to interact with users through conversation.

With a Little Help from the Authors: Reproducing Human Evaluation of an MT Error Detector

no code implementations12 Aug 2023 Ondřej Plátek, Mateusz Lango, Ondřej Dušek

This work presents our efforts to reproduce the results of the human evaluation experiment presented in the paper of Vamvas and Sennrich (2022), which evaluated an automatic system detecting over- and undertranslations (translations containing more or less information than the original) in machine translation (MT) outputs.

Machine Translation Translation

LEEETs-Dial: Linguistic Entrainment in End-to-End Task-oriented Dialogue systems

no code implementations15 Nov 2023 Nalin Kumar, Ondřej Dušek

Linguistic entrainment, or alignment, represents a phenomenon where linguistic patterns employed by conversational participants converge to one another.

Task-Oriented Dialogue Systems

Balancing the Style-Content Trade-Off in Sentiment Transfer Using Polarity-Aware Denoising

1 code implementation22 Dec 2023 Sourabrata Mukherjee, Zdeněk Kasner, Ondřej Dušek

Text sentiment transfer aims to flip the sentiment polarity of a sentence (positive to negative or vice versa) while preserving its sentiment-independent content.

Denoising Representation Learning +2

Beyond Reference-Based Metrics: Analyzing Behaviors of Open LLMs on Data-to-Text Generation

no code implementations18 Jan 2024 Zdeněk Kasner, Ondřej Dušek

We analyze the behaviors of open large language models (LLMs) on the task of data-to-text (D2T) generation, i. e., generating coherent and relevant text from structured data.

Data-to-Text Generation In-Context Learning

Leak, Cheat, Repeat: Data Contamination and Evaluation Malpractices in Closed-Source LLMs

no code implementations6 Feb 2024 Simone Balloccu, Patrícia Schmidtová, Mateusz Lango, Ondřej Dušek

Natural Language Processing (NLP) research is increasingly focusing on the use of Large Language Models (LLMs), with some of the most popular ones being either fully or partially closed-source.

Text Detoxification as Style Transfer in English and Hindi

no code implementations12 Feb 2024 Sourabrata Mukherjee, Akanksha Bansal, Atul Kr. Ojha, John P. McCrae, Ondřej Dušek

This task contributes to safer and more respectful online communication and can be considered a Text Style Transfer (TST) task, where the text style changes while its content is preserved.

Multi-Task Learning Sentence +2

Cannot find the paper you are looking for? You can Submit a new open access paper.