no code implementations • INLG (ACL) 2021 • Christian Richter, Yanran Chen, Steffen Eger
This paper describes our contribution to the Shared Task ReproGen by Belz et al. (2021), which investigates the reproducibility of human evaluations in the context of Natural Language Generation.
1 code implementation • 18 Feb 2024 • Yanran Chen, Wei Zhao, Anne Breitbarth, Manuel Stoeckel, Alexander Mehler, Steffen Eger
Even though we have evidence that recent parsers trained on modern treebanks are not heavily affected by data 'noise' such as spelling changes and OCR errors in our historic data, we find that results of syntactic language change are sensitive to the parsers involved, which is a caution against using a single parser for evaluating syntactic language change as done in previous work.
1 code implementation • 9 Dec 2023 • Ran Zhang, Aida Kostikova, Christoph Leiter, Jonas Belouadi, Daniil Larionov, Yanran Chen, Vivian Fresen, Steffen Eger
Artificial Intelligence (AI) has witnessed rapid growth, especially in the subfields Natural Language Processing (NLP), Machine Learning (ML) and Computer Vision (CV).
1 code implementation • 31 Jul 2023 • Steffen Eger, Christoph Leiter, Jonas Belouadi, Ran Zhang, Aida Kostikova, Daniil Larionov, Yanran Chen, Vivian Fresen
In particular, we compile a list of the 40 most popular papers based on normalized citation counts from the first half of 2023.
no code implementations • 20 Feb 2023 • Christoph Leiter, Ran Zhang, Yanran Chen, Jonas Belouadi, Daniil Larionov, Vivian Fresen, Steffen Eger
ChatGPT, a chatbot developed by OpenAI, has gained widespread popularity and media attention since its release in November 2022.
1 code implementation • 20 Dec 2022 • Yanran Chen, Steffen Eger
Our human evaluation suggests that our best end-to-end system performs similarly to human authors (but arguably slightly worse).
1 code implementation • 15 Aug 2022 • Yanran Chen, Steffen Eger
Recently proposed BERT-based evaluation metrics for text generation perform well on standard benchmarks but are vulnerable to adversarial attacks, e. g., relating to information correctness.
1 code implementation • 30 Mar 2022 • Yanran Chen, Jonas Belouadi, Steffen Eger
We find that reproduction of claims and results often fails because of (i) heavy undocumented preprocessing involved in the metrics, (ii) missing code and (iii) reporting weaker results for the baseline metrics.