no code implementations • CRAC (ACL) 2021 • YIlun Zhu, Sameer Pradhan, Amir Zeldes
SOTA coreference resolution produces increasingly impressive scores on the OntoNotes benchmark.
no code implementations • GWC 2019 • Laura Slaughter, Luis Morgado Da Costa, So Miyagawa, Marco Büchler, Amir Zeldes, Heike Behlmer
With the increasing availability of wordnets for ancient languages, such as Ancient Greek and Latin, gaps remain in the coverage of less studied languages of antiquity.
no code implementations • EMNLP (DISRPT) 2021 • Amir Zeldes, Yang Janet Liu, Mikel Iruskieta, Philippe Muller, Chloé Braud, Sonia Badene
In 2021, we organized the second iteration of a shared task dedicated to the underlying units used in discourse parsing across formalisms: the DISRPT Shared Task (Discourse Relation Parsing and Treebanking).
no code implementations • LREC 2022 • Anna Nedoluzhko, Michal Novák, Martin Popel, Zdeněk Žabokrtský, Amir Zeldes, Daniel Zeman
Recent advances in standardization for annotated language resources have led to successful large scale efforts, such as the Universal Dependencies (UD) project for multilingual syntactically annotated data.
no code implementations • LREC (LAW) 2022 • Luke Gessler, Lauren Levine, Amir Zeldes
Large scale annotation of rich multilayer corpus data is expensive and time consuming, motivating approaches that integrate high quality automatic tools with active learning in order to prioritize human labeling of hard cases.
no code implementations • NAACL (TeachingNLP) 2021 • Emma Manning, Nathan Schneider, Amir Zeldes
This paper describes the primarily-graduate computational linguistics and NLP curriculum at Georgetown University, a U. S. university that has seen significant growth in these areas in recent years.
1 code implementation • 1 Nov 2024 • Yang Janet Liu, Tatsuya Aoyama, Wesley Scivetti, YIlun Zhu, Shabnam Behzad, Lauren Elizabeth Levine, Jessica Lin, Devika Tiwari, Amir Zeldes
Work on shallow discourse parsing in English has focused on the Wall Street Journal corpus, the only large-scale dataset for the language in the PDTB framework.
no code implementations • 2 Oct 2024 • Lauren Levine, Amir Zeldes
Comparing bridging annotations across coreference resources is difficult, largely due to a lack of standardization across definitions and annotation schemas and narrow coverage of disparate text domains across resources.
1 code implementation • 17 Jul 2024 • Lauren Levine, Cindy Tung Li, Lydia Bremer-McCollum, Nicholas Wagner, Amir Zeldes
While not suitable for definitive manuscript reconstruction, we argue that our RNN model can help scholars rank the likelihood of textual reconstructions.
1 code implementation • 26 Mar 2024 • Leonie Weissweiler, Nina Böbel, Kirian Guiller, Santiago Herrera, Wesley Scivetti, Arthur Lorenzi, Nurit Melnik, Archna Bhatia, Hinrich Schütze, Lori Levin, Amir Zeldes, Joakim Nivre, William Croft, Nathan Schneider
The Universal Dependencies (UD) project has created an invaluable collection of treebanks with contributions in over 140 languages.
1 code implementation • 25 Mar 2024 • YIlun Zhu, Siyao Peng, Sameer Pradhan, Amir Zeldes
We then propose a two-step neural mention and coreference resolution system, named SPLICE, and compare its performance to the end-to-end approach in two scenarios: the OntoNotes test set and the out-of-domain (OOD) OntoGUM corpus.
2 code implementations • 20 Mar 2024 • Amir Zeldes, Tatsuya Aoyama, Yang Janet Liu, Siyao Peng, Debopam Das, Luke Gessler
In this article we present Enhanced Rhetorical Structure Theory (eRST), a new theoretical framework for computational discourse analysis, based on an expansion of Rhetorical Structure Theory (RST).
no code implementations • 31 Jan 2024 • Jessica Lin, Amir Zeldes
As NLP models become increasingly capable of understanding documents in terms of coherent entities rather than strings, obtaining the most salient entities for each document is not only an important end task in itself but also vital for Information Retrieval (IR) and other downstream applications such as controllable summarization.
1 code implementation • 20 Sep 2023 • YIlun Zhu, Siyao Peng, Sameer Pradhan, Amir Zeldes
Previous attempts to incorporate a mention detection step into end-to-end neural coreference resolution for English have been hampered by the lack of singleton mention span data as well as other entity information.
Ranked #1 on
Coreference Resolution
on OntoGUM
1 code implementation • 10 Sep 2023 • Yang Janet Liu, Tatsuya Aoyama, Amir Zeldes
Despite recent advances in Natural Language Processing (NLP), hierarchical discourse parsing in the framework of Rhetorical Structure Theory remains challenging, and our understanding of the reasons for this are as yet limited.
1 code implementation • 20 Jun 2023 • Yang Janet Liu, Amir Zeldes
Automatic summarization with pre-trained language models has led to impressively fluent results, but is prone to 'hallucinations', low performance on non-news genres, and outputs which are not exactly summaries.
1 code implementation • 3 Jun 2023 • Tatsuya Aoyama, Shabnam Behzad, Luke Gessler, Lauren Levine, Jessica Lin, Yang Janet Liu, Siyao Peng, YIlun Zhu, Amir Zeldes
We evaluate state-of-the-art NLP systems on GENTLE and find severe degradation for at least some genres in their performance on all tasks, which indicates GENTLE's utility as an evaluation dataset for NLP systems.
1 code implementation • 13 Feb 2023 • Yang Janet Liu, Amir Zeldes
To our knowledge, this study is the first to fully evaluate cross-corpus RST parsing generalizability on complete trees, examine between-genre degradation within an RST corpus, and investigate the impact of genre diversity in training data composition.
no code implementations • 1 Feb 2023 • Amir Zeldes, Nathan Schneider
Recent efforts to consolidate guidelines and treebanks in the Universal Dependencies project raise the expectation that joint training and dataset comparison is increasingly possible for high-resource languages such as English, which have multiple corpora.
1 code implementation • 23 Dec 2022 • Luke Gessler, Amir Zeldes
Transformer language models (TLMs) are critical for most NLP tasks, but they are difficult to create for low-resource languages because of how much pretraining data they require.
no code implementations • 18 Dec 2022 • Shabnam Behzad, Amir Zeldes, Nathan Schneider
In this paper, we present strong baselines for the task of Feedback Comment Generation for Writing Learning.
1 code implementation • 19 Oct 2022 • Siyao Peng, Yang Janet Liu, Amir Zeldes
A lack of large-scale human-annotated data has hampered the hierarchical discourse parsing of Chinese.
2 code implementations • 14 Oct 2022 • Amir Zeldes, Nick Howell, Noam Ordan, Yifat Ben Moshe
Foundational Hebrew NLP tasks such as segmentation, tagging and parsing, have relied to date on various versions of the Hebrew Treebank (HTB, Sima'an et al. 2001).
no code implementations • 11 Oct 2022 • Siyao Peng, Yang Janet Liu, Amir Zeldes
This document provides extensive guidelines and examples for Rhetorical Structure Theory (RST) annotation in Mandarin Chinese.
1 code implementation • 1 May 2022 • Shabnam Behzad, Keisuke Sakaguchi, Nathan Schneider, Amir Zeldes
We present ELQA, a corpus of questions and answers in and about the English language.
no code implementations • 17 Dec 2021 • Amir Zeldes
Current work on automatic coreference resolution has focused on the OntoNotes benchmark dataset, due to both its size and consistency.
no code implementations • 12 Oct 2021 • YIlun Zhu, Sameer Pradhan, Amir Zeldes
SOTA coreference resolution produces increasingly impressive scores on the OntoNotes benchmark.
1 code implementation • EMNLP (DISRPT) 2021 • Luke Gessler, Shabnam Behzad, Yang Janet Liu, Siyao Peng, YIlun Zhu, Amir Zeldes
This paper describes our submission to the DISRPT2021 Shared Task on Discourse Unit Segmentation, Connective Detection, and Relation Classification.
no code implementations • EMNLP (LAW, DMR) 2021 • Jessica Lin, Amir Zeldes
Previous work on Entity Linking has focused on resources targeting non-nested proper named entity mentions, often in data from Wikipedia, i. e. Wikification.
Ranked #1 on
Entity Linking
on GUM
no code implementations • UDW (SyntaxFest) 2021 • Nathan Schneider, Amir Zeldes
While the highly multilingual Universal Dependencies (UD) project provides extensive guidelines for clausal structure as well as structure within canonical nominal phrases, a standard treatment is lacking for many "mischievous" nominal phenomena that break the mold.
1 code implementation • ACL 2021 • YIlun Zhu, Sameer Pradhan, Amir Zeldes
SOTA coreference resolution produces increasingly impressive scores on the OntoNotes benchmark.
Ranked #2 on
Coreference Resolution
on OntoGUM
no code implementations • 3 Nov 2020 • Manuela Sanguinetti, Lauren Cassidy, Cristina Bosco, Özlem Çetinoğlu, Alessandra Teresa Cignarella, Teresa Lynn, Ines Rehbein, Josef Ruppenhofer, Djamé Seddah, Amir Zeldes
This article presents a discussion on the main linguistic phenomena which cause difficulties in the analysis of user-generated texts found on the web and in social media, and proposes a set of annotation guidelines for their treatment within the Universal Dependencies (UD) framework of syntactic analysis.
no code implementations • COLING (LaTeCHCLfL, CLFL, LaTeCH) 2020 • Amir Zeldes, Lance Martin, Sichang Tu
Entity recognition provides semantic access to ancient materials in the Digital Humanities: itexposes people and places of interest in texts that cannot be read exhaustively, facilitates linkingresources and can provide a window into text contents, even for texts with no translations.
1 code implementation • LREC 2020 • Luke Gessler, Siyao Peng, Yang Liu, YIlun Zhu, Shabnam Behzad, Amir Zeldes
We present a freely available, genre-balanced English web corpus totaling 4M tokens and featuring a large number of high-quality automatic annotation layers, including dependency trees, non-named entity annotations, coreference resolution, and discourse trees in Rhetorical Structure Theory.
no code implementations • LREC 2020 • Manuela Sanguinetti, Cristina Bosco, Lauren Cassidy, {\"O}zlem {\c{C}}etino{\u{g}}lu, Aless Cignarella, ra Teresa, Teresa Lynn, Ines Rehbein, Josef Ruppenhofer, Djam{\'e} Seddah, Amir Zeldes
The paper presents a discussion on the main linguistic phenomena of user-generated texts found in web and social media, and proposes a set of annotation guidelines for their treatment within the Universal Dependencies (UD) framework.
1 code implementation • LREC 2020 • Shabnam Behzad, Amir Zeldes
However, when these models are applied to other corpora with different genres, and especially user-generated data from the Web, we see substantial drops in performance.
no code implementations • 8 Jan 2020 • Amir Zeldes, Yang Liu
Previous data-driven work investigating the types and distributions of discourse relation signals, including discourse markers such as 'however' or phrases such as 'as a result' has focused on the relative frequencies of signal words within and outside text from each discourse relation.
no code implementations • 11 Dec 2019 • Caroline T. Schroeder, Amir Zeldes
Scholarship on underresourced languages bring with them a variety of challenges which make access to the full spectrum of source materials and their evaluation difficult.
no code implementations • COLING 2018 • Siyao Peng, Amir Zeldes
We describe and evaluate different approaches to the conversion of gold standard corpus data from Stanford Typed Dependencies (SD) and Penn-style constituent trees to the latest English Universal Dependencies representation (UD 2. 2).
no code implementations • WS 2019 • Amir Zeldes, Debopam Das, Erick Galani Maziero, Juliano Antonio, Mikel Iruskieta
In 2019, we organized the first iteration of a shared task dedicated to the underlying units used in discourse parsing across formalisms: the DISRPT Shared Task on Elementary Discourse Unit Segmentation and Connective Detection.
no code implementations • WS 2019 • Amir Zeldes, Debopam Das, Erick Galani Maziero, Juliano Antonio, Mikel Iruskieta
This overview summarizes the main contributions of the accepted papers at the 2019 workshop on Discourse Relation Parsing and Treebanking (DISRPT 2019).
1 code implementation • WS 2019 • Luke Gessler, Yang Liu, Amir Zeldes
This paper presents a new system for open-ended discourse relation signal annotation in the framework of Rhetorical Structure Theory (RST), implemented on top of an online tool for RST annotation.
1 code implementation • WS 2019 • Yue Yu, YIlun Zhu, Yang Liu, Yan Liu, Siyao Peng, Mackenzie Gong, Amir Zeldes
In this paper we present GumDrop, Georgetown University's entry at the DISRPT 2019 Shared Task on automatic discourse unit segmentation and connective detection.
no code implementations • WS 2018 • Amir Zeldes, Mitchell Abrams
This paper presents the Coptic Universal Dependency Treebank, the first dependency treebank within the Egyptian subfamily of the Afro-Asiatic languages.
1 code implementation • WS 2018 • Amir Zeldes
This paper presents a novel approach to the segmentation of orthographic word forms in contemporary Hebrew, focusing purely on splitting without carrying out morphological analysis or disambiguation.
Ranked #1 on
Text Segmentation
on Wiki5K Hebrew segmentation
no code implementations • COLING 2018 • Frank Feder, Maxim Kupreyev, Emma Manning, Caroline T. Schroeder, Amir Zeldes
We describe a new project publishing a freely available online dictionary for Coptic.
no code implementations • WS 2018 • Amir Zeldes
Notional anaphors are pronouns which disagree with their antecedents' grammatical categories for notional reasons, such as plural to singular agreement in: 'the government ... they'.
no code implementations • NAACL 2018 • Sean MacAvaney, Amir Zeldes
We investigate the effect of various dependency-based word embeddings on distinguishing between functional and domain similarity, word similarity rankings, and two downstream tasks in English.
no code implementations • 2 Aug 2011 • Laurent Romary, Amir Zeldes, Florian Zipser
This paper introduces, an XML format developed to serialise the object model defined by the ISO Syntactic Annotation Framework SynAF.