Search Results for author: Simon Mille

Found 36 papers, 5 papers with code

Disentangling the Properties of Human Evaluation Methods: A Classification System to Support Comparability, Meta-Evaluation and Reproducibility Testing

no code implementations INLG (ACL) 2020 Anya Belz, Simon Mille, David M. Howcroft

Current standards for designing and reporting human evaluations in NLP mean it is generally unclear which evaluations are comparable and can be expected to yield similar results when applied to the same system outputs.

Twenty Years of Confusion in Human Evaluation: NLG Needs Evaluation Sheets and Standardised Definitions

no code implementations INLG (ACL) 2020 David M. Howcroft, Anya Belz, Miruna-Adriana Clinciu, Dimitra Gkatzia, Sadid A. Hasan, Saad Mahamood, Simon Mille, Emiel van Miltenburg, Sashank Santhanam, Verena Rieser

Human assessment remains the most trusted form of evaluation in NLG, but highly diverse approaches and a proliferation of different quality criteria used by researchers make it difficult to compare results and draw conclusions across papers, with adverse implications for meta-evaluation and reproducibility.

Experimental Design

Text-in-Context: Token-Level Error Detection for Table-to-Text Generation

1 code implementation INLG (ACL) 2021 Zdeněk Kasner, Simon Mille, Ondřej Dušek

Our system can detect the errors automatically using a combination of a rule-based natural language generation (NLG) system and pretrained language models (LMs).

Language Modelling Semantic Similarity +3

The Third Multilingual Surface Realisation Shared Task (SR’20): Overview and Evaluation Results

1 code implementation MSR (COLING) 2020 Simon Mille, Anya Belz, Bernd Bohnet, Thiago castro Ferreira, Yvette Graham, Leo Wanner

As in SR’18 and SR’19, the shared task comprised two tracks: (1) a Shallow Track where the inputs were full UD structures with word order information removed and tokens lemmatised; and (2) a Deep Track where additionally, functional words and morphological information were removed.

Needle in a Haystack: An Analysis of High-Agreement Workers on MTurk for Summarization

1 code implementation20 Dec 2022 Lining Zhang, Simon Mille, Yufang Hou, Daniel Deutsch, Elizabeth Clark, Yixin Liu, Saad Mahamood, Sebastian Gehrmann, Miruna Clinciu, Khyathi Chandu, João Sedoc

To prevent the costly and inefficient use of resources on low-quality annotations, we want a method for creating a pool of dependable annotators who can effectively complete difficult tasks, such as evaluating automatic summarization.

GEMv2: Multilingual NLG Benchmarking in a Single Line of Code

no code implementations22 Jun 2022 Sebastian Gehrmann, Abhik Bhattacharjee, Abinaya Mahendiran, Alex Wang, Alexandros Papangelis, Aman Madaan, Angelina McMillan-Major, Anna Shvets, Ashish Upadhyay, Bingsheng Yao, Bryan Wilie, Chandra Bhagavatula, Chaobin You, Craig Thomson, Cristina Garbacea, Dakuo Wang, Daniel Deutsch, Deyi Xiong, Di Jin, Dimitra Gkatzia, Dragomir Radev, Elizabeth Clark, Esin Durmus, Faisal Ladhak, Filip Ginter, Genta Indra Winata, Hendrik Strobelt, Hiroaki Hayashi, Jekaterina Novikova, Jenna Kanerva, Jenny Chim, Jiawei Zhou, Jordan Clive, Joshua Maynez, João Sedoc, Juraj Juraska, Kaustubh Dhole, Khyathi Raghavi Chandu, Laura Perez-Beltrachini, Leonardo F. R. Ribeiro, Lewis Tunstall, Li Zhang, Mahima Pushkarna, Mathias Creutz, Michael White, Mihir Sanjay Kale, Moussa Kamal Eddine, Nico Daheim, Nishant Subramani, Ondrej Dusek, Paul Pu Liang, Pawan Sasanka Ammanamanchi, Qi Zhu, Ratish Puduppully, Reno Kriz, Rifat Shahriyar, Ronald Cardenas, Saad Mahamood, Salomey Osei, Samuel Cahyawijaya, Sanja Štajner, Sebastien Montella, Shailza, Shailza Jolly, Simon Mille, Tahmid Hasan, Tianhao Shen, Tosin Adewumi, Vikas Raunak, Vipul Raheja, Vitaly Nikolaev, Vivian Tsai, Yacine Jernite, Ying Xu, Yisi Sang, Yixin Liu, Yufang Hou

This problem is especially pertinent in natural language generation which requires ever-improving suites of datasets, metrics, and human evaluation to make definitive claims.

Benchmarking Text Generation

Quantified Reproducibility Assessment of NLP Results

no code implementations ACL 2022 Anya Belz, Maja Popović, Simon Mille

This paper describes and tests a method for carrying out quantified reproducibility assessment (QRA) that is based on concepts and definitions from metrology.

NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation

2 code implementations6 Dec 2021 Kaustubh D. Dhole, Varun Gangal, Sebastian Gehrmann, Aadesh Gupta, Zhenhao Li, Saad Mahamood, Abinaya Mahendiran, Simon Mille, Ashish Shrivastava, Samson Tan, Tongshuang Wu, Jascha Sohl-Dickstein, Jinho D. Choi, Eduard Hovy, Ondrej Dusek, Sebastian Ruder, Sajant Anand, Nagender Aneja, Rabin Banjade, Lisa Barthe, Hanna Behnke, Ian Berlot-Attwell, Connor Boyle, Caroline Brun, Marco Antonio Sobrevilla Cabezudo, Samuel Cahyawijaya, Emile Chapuis, Wanxiang Che, Mukund Choudhary, Christian Clauss, Pierre Colombo, Filip Cornell, Gautier Dagan, Mayukh Das, Tanay Dixit, Thomas Dopierre, Paul-Alexis Dray, Suchitra Dubey, Tatiana Ekeinhor, Marco Di Giovanni, Tanya Goyal, Rishabh Gupta, Louanes Hamla, Sang Han, Fabrice Harel-Canada, Antoine Honore, Ishan Jindal, Przemyslaw K. Joniak, Denis Kleyko, Venelin Kovatchev, Kalpesh Krishna, Ashutosh Kumar, Stefan Langer, Seungjae Ryan Lee, Corey James Levinson, Hualou Liang, Kaizhao Liang, Zhexiong Liu, Andrey Lukyanenko, Vukosi Marivate, Gerard de Melo, Simon Meoni, Maxime Meyer, Afnan Mir, Nafise Sadat Moosavi, Niklas Muennighoff, Timothy Sum Hon Mun, Kenton Murray, Marcin Namysl, Maria Obedkova, Priti Oli, Nivranshu Pasricha, Jan Pfister, Richard Plant, Vinay Prabhu, Vasile Pais, Libo Qin, Shahab Raji, Pawan Kumar Rajpoot, Vikas Raunak, Roy Rinberg, Nicolas Roberts, Juan Diego Rodriguez, Claude Roux, Vasconcellos P. H. S., Ananya B. Sai, Robin M. Schmidt, Thomas Scialom, Tshephisho Sefara, Saqib N. Shamsi, Xudong Shen, Haoyue Shi, Yiwen Shi, Anna Shvets, Nick Siegel, Damien Sileo, Jamie Simon, Chandan Singh, Roman Sitelew, Priyank Soni, Taylor Sorensen, William Soto, Aman Srivastava, KV Aditya Srivatsa, Tony Sun, Mukund Varma T, A Tabassum, Fiona Anting Tan, Ryan Teehan, Mo Tiwari, Marie Tolkiehn, Athena Wang, Zijian Wang, Gloria Wang, Zijie J. Wang, Fuxuan Wei, Bryan Wilie, Genta Indra Winata, Xinyi Wu, Witold Wydmański, Tianbao Xie, Usama Yaseen, Michael A. Yee, Jing Zhang, Yue Zhang

Data augmentation is an important component in the robustness evaluation of models in natural language processing (NLP) and in enhancing the diversity of the data they are trained on.

Data Augmentation

Automatic Construction of Evaluation Suites for Natural Language Generation Datasets

no code implementations16 Jun 2021 Simon Mille, Kaustubh D. Dhole, Saad Mahamood, Laura Perez-Beltrachini, Varun Gangal, Mihir Kale, Emiel van Miltenburg, Sebastian Gehrmann

By applying this framework to the GEM generation benchmark, we propose an evaluation suite made of 80 challenge sets, demonstrate the kinds of analyses that it enables and shed light onto the limits of current generation models.

Text Generation

Assessing the Syntactic Capabilities of Transformer-based Multilingual Language Models

no code implementations Findings (ACL) 2021 Laura Pérez-Mayos, Alba Táboas García, Simon Mille, Leo Wanner

More specifically, we evaluate the syntactic generalization potential of the models on English and Spanish tests, comparing the syntactic abilities of monolingual and multilingual models on the same language (English), and of multilingual models on two different languages (English and Spanish).

Cross-Lingual Transfer

The Second Multilingual Surface Realisation Shared Task (SR'19): Overview and Evaluation Results

no code implementations WS 2019 Simon Mille, Anja Belz, Bernd Bohnet, Yvette Graham, Leo Wanner

We report results from the SR{'}19 Shared Task, the second edition of a multilingual surface realisation task organised as part of the EMNLP{'}19 Workshop on Multilingual Surface Realisation.

Back-Translation as Strategy to Tackle the Lack of Corpus in Natural Language Generation from Semantic Representations

no code implementations WS 2019 Marco Antonio Sobrevilla Cabezudo, Simon Mille, Thiago Pardo

This paper presents an exploratory study that aims to evaluate the usefulness of back-translation in Natural Language Generation (NLG) from semantic representations for non-English languages.

Machine Translation Text Generation +1

Underspecified Universal Dependency Structures as Inputs for Multilingual Surface Realisation

no code implementations WS 2018 Simon Mille, Anja Belz, Bernd Bohnet, Leo Wanner

In this paper, we present the datasets used in the Shallow and Deep Tracks of the First Multilingual Surface Realisation Shared Task (SR{'}18).

Natural Language Understanding Text Generation

Sentence Packaging in Text Generation from Semantic Graphs as a Community Detection Problem

no code implementations WS 2018 Alex Shvets, er, Simon Mille, Leo Wanner

An increasing amount of research tackles the challenge of text generation from abstract ontological or semantic structures, which are in their very nature potentially large connected graphs.

Community Detection Sentence +2

The First Multilingual Surface Realisation Shared Task (SR'18): Overview and Evaluation Results

no code implementations WS 2018 Simon Mille, Anja Belz, Bernd Bohnet, Yvette Graham, Emily Pitler, Leo Wanner

We report results from the SR{'}18 Shared Task, a new multilingual surface realisation task organised as part of the ACL{'}18 Workshop on Multilingual Surface Realisation.

Shared Task Proposal: Multilingual Surface Realization Using Universal Dependency Trees

no code implementations WS 2017 Simon Mille, Bernd Bohnet, Leo Wanner, Anja Belz

We propose a shared task on multilingual Surface Realization, i. e., on mapping unordered and uninflected universal dependency trees to correctly ordered and inflected sentences in a number of languages.

Machine Translation POS +1

A demo of FORGe: the Pompeu Fabra Open Rule-based Generator

no code implementations WS 2017 Simon Mille, Leo Wanner

This demo paper presents the multilingual deep sentence generator developed by the TALN group at Universitat Pompeu Fabra, implemented as a series of rule-based graph-transducers for the syntacticization of the input graphs, the resolution of morphological agreements, and the linearization of the trees.

Sentence Text Generation

FORGe at SemEval-2017 Task 9: Deep sentence generation based on a sequence of graph transducers

no code implementations SEMEVAL 2017 Simon Mille, Roberto Carlini, Alicia Burga, Leo Wanner

We present the contribution of Universitat Pompeu Fabra{'}s NLP group to the SemEval Task 9. 2 (AMR-to-English Generation).

Sentence

Cannot find the paper you are looking for? You can Submit a new open access paper.