Search Results for author: Amir Zeldes

We investigate the effect of various dependency-based word embeddings on distinguishing between functional and domain similarity, word similarity rankings, and two downstream tasks in English.

Word Embeddings Word Similarity

Paper
Add Code

A Predictive Model for Notional Anaphora in English

no code implementations • WS 2018 • Amir Zeldes

Notional anaphors are pronouns which disagree with their antecedents' grammatical categories for notional reasons, such as plural to singular agreement in: 'the government ... they'.

coreference-resolution Referring Expression +1

Paper
Add Code

A Linked Coptic Dictionary Online

no code implementations • COLING 2018 • Frank Feder, Maxim Kupreyev, Emma Manning, Caroline T. Schroeder, Amir Zeldes

We describe a new project publishing a freely available online dictionary for Coptic.

Paper
Add Code

A Characterwise Windowed Approach to Hebrew Morphological Segmentation

1 code implementation • WS 2018 • Amir Zeldes

This paper presents a novel approach to the segmentation of orthographic word forms in contemporary Hebrew, focusing purely on splitting without carrying out morphological analysis or disambiguation.

Ranked #1 on Text Segmentation on Wiki5K Hebrew segmentation

Binary Classification General Classification +2

Paper
Code

The Coptic Universal Dependency Treebank

no code implementations • WS 2018 • Amir Zeldes, Mitchell Abrams

This paper presents the Coptic Universal Dependency Treebank, the first dependency treebank within the Egyptian subfamily of the Afro-Asiatic languages.

Paper
Add Code

GumDrop at the DISRPT2019 Shared Task: A Model Stacking Approach to Discourse Unit Segmentation and Connective Detection

1 code implementation • WS 2019 • Yue Yu, YIlun Zhu, Yang Liu, Yan Liu, Siyao Peng, Mackenzie Gong, Amir Zeldes

In this paper we present GumDrop, Georgetown University's entry at the DISRPT 2019 Shared Task on automatic discourse unit segmentation and connective detection.

Connective Detection Discourse Segmentation +2

Paper
Code

A Discourse Signal Annotation System for RST Trees

1 code implementation • WS 2019 • Luke Gessler, Yang Liu, Amir Zeldes

This paper presents a new system for open-ended discourse relation signal annotation in the framework of Rhetorical Structure Theory (RST), implemented on top of an online tool for RST annotation.

Paper
Code

Introduction to Discourse Relation Parsing and Treebanking (DISRPT): 7th Workshop on Rhetorical Structure Theory and Related Formalisms

no code implementations • WS 2019 • Amir Zeldes, Debopam Das, Erick Galani Maziero, Juliano Antonio, Mikel Iruskieta

This overview summarizes the main contributions of the accepted papers at the 2019 workshop on Discourse Relation Parsing and Treebanking (DISRPT 2019).

Connective Detection

Paper
Add Code

The DISRPT 2019 Shared Task on Elementary Discourse Unit Segmentation and Connective Detection

no code implementations • WS 2019 • Amir Zeldes, Debopam Das, Erick Galani Maziero, Juliano Antonio, Mikel Iruskieta

In 2019, we organized the first iteration of a shared task dedicated to the underlying units used in discourse parsing across formalisms: the DISRPT Shared Task on Elementary Discourse Unit Segmentation and Connective Detection.

Connective Detection

Paper
Add Code

All Roads Lead to UD: Converting Stanford and Penn Parses to English Universal Dependencies with Multilayer Annotations

no code implementations • COLING 2018 • Siyao Peng, Amir Zeldes

We describe and evaluate different approaches to the conversion of gold standard corpus data from Stanford Typed Dependencies (SD) and Penn-style constituent trees to the latest English Universal Dependencies representation (UD 2. 2).

coreference-resolution NER

Paper
Add Code

A Collaborative Ecosystem for Digital Coptic Studies

no code implementations • 11 Dec 2019 • Caroline T. Schroeder, Amir Zeldes

Scholarship on underresourced languages bring with them a variety of challenges which make access to the full spectrum of source materials and their evaluation difficult.

Paper
Add Code

A Neural Approach to Discourse Relation Signal Detection

no code implementations • 8 Jan 2020 • Amir Zeldes, Yang Liu

Previous data-driven work investigating the types and distributions of discourse relation signals, including discourse markers such as 'however' or phrases such as 'as a result' has focused on the relative frequencies of signal words within and outside text from each discourse relation.

Relation Relation Classification

Paper
Add Code

A Cross-Genre Ensemble Approach to Robust Reddit Part of Speech Tagging

1 code implementation • LREC 2020 • Shabnam Behzad, Amir Zeldes

However, when these models are applied to other corpora with different genres, and especially user-generated data from the Web, we see substantial drops in performance.

Part-Of-Speech Tagging

Paper
Code

Treebanking User-Generated Content: A Proposal for a Unified Representation in Universal Dependencies

no code implementations • LREC 2020 • Manuela Sanguinetti, Cristina Bosco, Lauren Cassidy, {\"O}zlem {\c{C}}etino{\u{g}}lu, Aless Cignarella, ra Teresa, Teresa Lynn, Ines Rehbein, Josef Ruppenhofer, Djam{\'e} Seddah, Amir Zeldes

The paper presents a discussion on the main linguistic phenomena of user-generated texts found in web and social media, and proposes a set of annotation guidelines for their treatment within the Universal Dependencies (UD) framework.

Paper
Add Code

AMALGUM -- A Free, Balanced, Multilayer English Web Corpus

1 code implementation • LREC 2020 • Luke Gessler, Siyao Peng, Yang Liu, YIlun Zhu, Shabnam Behzad, Amir Zeldes

We present a freely available, genre-balanced English web corpus totaling 4M tokens and featuring a large number of high-quality automatic annotation layers, including dependency trees, non-named entity annotations, coreference resolution, and discourse trees in Rhetorical Structure Theory.

coreference-resolution

Paper
Code

Exhaustive Entity Recognition for Coptic: Challenges and Solutions

no code implementations • COLING (LaTeCHCLfL, CLFL, LaTeCH) 2020 • Amir Zeldes, Lance Martin, Sichang Tu

Entity recognition provides semantic access to ancient materials in the Digital Humanities: itexposes people and places of interest in texts that cannot be read exhaustively, facilitates linkingresources and can provide a window into text contents, even for texts with no translations.

Entity Linking NER

Paper
Add Code

Treebanking User-Generated Content: a UD Based Overview of Guidelines, Corpora and Unified Recommendations

no code implementations • 3 Nov 2020 • Manuela Sanguinetti, Lauren Cassidy, Cristina Bosco, Özlem Çetinoğlu, Alessandra Teresa Cignarella, Teresa Lynn, Ines Rehbein, Josef Ruppenhofer, Djamé Seddah, Amir Zeldes

This article presents a discussion on the main linguistic phenomena which cause difficulties in the analysis of user-generated texts found on the web and in social media, and proposes a set of annotation guidelines for their treatment within the Universal Dependencies (UD) framework of syntactic analysis.

Paper
Add Code

OntoGUM: Evaluating Contextualized SOTA Coreference Resolution on 12 More Genres

1 code implementation • ACL 2021 • YIlun Zhu, Sameer Pradhan, Amir Zeldes

SOTA coreference resolution produces increasingly impressive scores on the OntoNotes benchmark.

Ranked #2 on Coreference Resolution on OntoGUM

coreference-resolution

Paper
Code

Mischievous Nominal Constructions in Universal Dependencies

no code implementations • UDW (SyntaxFest) 2021 • Nathan Schneider, Amir Zeldes

While the highly multilingual Universal Dependencies (UD) project provides extensive guidelines for clausal structure as well as structure within canonical nominal phrases, a standard treatment is lacking for many "mischievous" nominal phenomena that break the mold.

Paper
Add Code

WikiGUM: Exhaustive Entity Linking for Wikification in 12 Genres

no code implementations • EMNLP (LAW, DMR) 2021 • Jessica Lin, Amir Zeldes

Previous work on Entity Linking has focused on resources targeting non-nested proper named entity mentions, often in data from Wikipedia, i. e. Wikification.

Ranked #1 on Entity Linking on GUM

Entity Linking

Paper
Add Code

DisCoDisCo at the DISRPT2021 Shared Task: A System for Discourse Segmentation, Classification, and Connective Detection

1 code implementation • EMNLP (DISRPT) 2021 • Luke Gessler, Shabnam Behzad, Yang Janet Liu, Siyao Peng, YIlun Zhu, Amir Zeldes

This paper describes our submission to the DISRPT2021 Shared Task on Discourse Unit Segmentation, Connective Detection, and Relation Classification.

Classification Connective Detection +6

Paper
Code

Anatomy of OntoGUM--Adapting GUM to the OntoNotes Scheme to Evaluate Robustness of SOTA Coreference Algorithms

no code implementations • 12 Oct 2021 • YIlun Zhu, Sameer Pradhan, Amir Zeldes

SOTA coreference resolution produces increasingly impressive scores on the OntoNotes benchmark.

Anatomy coreference-resolution

Paper
Add Code

Can we Fix the Scope for Coreference? Problems and Solutions for Benchmarks beyond OntoNotes

no code implementations • 17 Dec 2021 • Amir Zeldes

Current work on automatic coreference resolution has focused on the OntoNotes benchmark dataset, due to both its size and consistency.

coreference-resolution

Paper
Add Code

ELQA: A Corpus of Metalinguistic Questions and Answers about English

1 code implementation • 1 May 2022 • Shabnam Behzad, Keisuke Sakaguchi, Nathan Schneider, Amir Zeldes

We present ELQA, a corpus of questions and answers in and about the English language.

Answer Generation Question Answering

Paper
Code

Chinese Discourse Annotation Reference Manual

no code implementations • 11 Oct 2022 • Siyao Peng, Yang Janet Liu, Amir Zeldes

This document provides extensive guidelines and examples for Rhetorical Structure Theory (RST) annotation in Mandarin Chinese.

Paper
Add Code

A Second Wave of UD Hebrew Treebanking and Cross-Domain Parsing

2 code implementations • 14 Oct 2022 • Amir Zeldes, Nick Howell, Noam Ordan, Yifat Ben Moshe

Foundational Hebrew NLP tasks such as segmentation, tagging and parsing, have relied to date on various versions of the Hebrew Treebank (HTB, Sima'an et al. 2001).

Language Modelling

Paper
Code

GCDT: A Chinese RST Treebank for Multigenre and Multilingual Discourse Parsing

1 code implementation • 19 Oct 2022 • Siyao Peng, Yang Janet Liu, Amir Zeldes

A lack of large-scale human-annotated data has hampered the hierarchical discourse parsing of Chinese.

Discourse Parsing

Paper
Code

Sentence-level Feedback Generation for English Language Learners: Does Data Augmentation Help?

no code implementations • 18 Dec 2022 • Shabnam Behzad, Amir Zeldes, Nathan Schneider

In this paper, we present strong baselines for the task of Feedback Comment Generation for Writing Learning.

Comment Generation Data Augmentation +1

Paper
Add Code

MicroBERT: Effective Training of Low-resource Monolingual BERTs through Parameter Reduction and Multitask Learning

1 code implementation • 23 Dec 2022 • Luke Gessler, Amir Zeldes

Transformer language models (TLMs) are critical for most NLP tasks, but they are difficult to create for low-resource languages because of how much pretraining data they require.

Dependency Parsing Language Modelling +3

Paper
Code

Are UD Treebanks Getting More Consistent? A Report Card for English UD

no code implementations • 1 Feb 2023 • Amir Zeldes, Nathan Schneider

Recent efforts to consolidate guidelines and treebanks in the Universal Dependencies project raise the expectation that joint training and dataset comparison is increasingly possible for high-resource languages such as English, which have multiple corpora.

Paper
Add Code

Why Can't Discourse Parsing Generalize? A Thorough Investigation of the Impact of Data Diversity

1 code implementation • 13 Feb 2023 • Yang Janet Liu, Amir Zeldes

To our knowledge, this study is the first to fully evaluate cross-corpus RST parsing generalizability on complete trees, examine between-genre degradation within an RST corpus, and investigate the impact of genre diversity in training data composition.

Cross-corpus Discourse Parsing

Paper
Code

GENTLE: A Genre-Diverse Multilayer Challenge Set for English NLP and Linguistic Evaluation

1 code implementation • 3 Jun 2023 • Tatsuya Aoyama, Shabnam Behzad, Luke Gessler, Lauren Levine, Jessica Lin, Yang Janet Liu, Siyao Peng, YIlun Zhu, Amir Zeldes

We evaluate state-of-the-art NLP systems on GENTLE and find severe degradation for at least some genres in their performance on all tasks, which indicates GENTLE's utility as an evaluation dataset for NLP systems.

coreference-resolution Dependency Parsing +2

Paper
Code

GUMSum: Multi-Genre Data and Evaluation for English Abstractive Summarization

1 code implementation • 20 Jun 2023 • Yang Janet Liu, Amir Zeldes

Automatic summarization with pre-trained language models has led to impressively fluent results, but is prone to 'hallucinations', low performance on non-news genres, and outputs which are not exactly summaries.

Abstractive Text Summarization

Paper
Code

What's Hard in English RST Parsing? Predictive Models for Error Analysis

1 code implementation • 10 Sep 2023 • Yang Janet Liu, Tatsuya Aoyama, Amir Zeldes

Despite recent advances in Natural Language Processing (NLP), hierarchical discourse parsing in the framework of Rhetorical Structure Theory remains challenging, and our understanding of the reasons for this are as yet limited.

Discourse Parsing

Paper
Code

Incorporating Singletons and Mention-based Features in Coreference Resolution via Multi-task Learning for Better Generalization

1 code implementation • 20 Sep 2023 • YIlun Zhu, Siyao Peng, Sameer Pradhan, Amir Zeldes

Previous attempts to incorporate a mention detection step into end-to-end neural coreference resolution for English have been hampered by the lack of singleton mention span data as well as other entity information.

Ranked #1 on Coreference Resolution on OntoGUM

coreference-resolution Multi-Task Learning

Paper
Code

GUMsley: Evaluating Entity Salience in Summarization for 12 English Genres

no code implementations • 31 Jan 2024 • Jessica Lin, Amir Zeldes

As NLP models become increasingly capable of understanding documents in terms of coherent entities rather than strings, obtaining the most salient entities for each document is not only an important end task in itself but also vital for Information Retrieval (IR) and other downstream applications such as controllable summarization.

Abstractive Text Summarization coreference-resolution +3

Paper
Add Code

eRST: A Signaled Graph Theory of Discourse Relations and Organization

no code implementations • 20 Mar 2024 • Amir Zeldes, Tatsuya Aoyama, Yang Janet Liu, Siyao Peng, Debopam Das, Luke Gessler

In this article we present Enhanced Rhetorical Structure Theory (eRST), a new theoretical framework for computational discourse analysis, based on an expansion of Rhetorical Structure Theory (RST).

Paper
Add Code

SPLICE: A Singleton-Enhanced PipeLIne for Coreference REsolution

1 code implementation • 25 Mar 2024 • YIlun Zhu, Siyao Peng, Sameer Pradhan, Amir Zeldes

We then propose a two-step neural mention and coreference resolution system, named SPLICE, and compare its performance to the end-to-end approach in two scenarios: the OntoNotes test set and the out-of-domain (OOD) OntoGUM corpus.

Avg coreference-resolution +1

Paper
Code

UCxn: Typologically Informed Annotation of Constructions Atop Universal Dependencies

1 code implementation • 26 Mar 2024 • Leonie Weissweiler, Nina Böbel, Kirian Guiller, Santiago Herrera, Wesley Scivetti, Arthur Lorenzi, Nurit Melnik, Archna Bhatia, Hinrich Schütze, Lori Levin, Amir Zeldes, Joakim Nivre, William Croft, Nathan Schneider

The Universal Dependencies (UD) project has created an invaluable collection of treebanks with contributions in over 140 languages.

Paper
Code

Midas Loop: A Prioritized Human-in-the-Loop Annotation for Large Scale Multilayer Data

no code implementations • LREC (LAW) 2022 • Luke Gessler, Lauren Levine, Amir Zeldes

Large scale annotation of rich multilayer corpus data is expensive and time consuming, motivating approaches that integrate high quality automatic tools with active learning in order to prioritize human labeling of hard cases.

Active Learning Management +3

Paper
Add Code

Overview of AMALGUM – Large Silver Quality Annotations across English Genres

no code implementations • SCiL 2021 • Luke Gessler, Siyao Peng, Yang Liu, YIlun Zhu, Shabnam Behzad, Amir Zeldes

Paper
Add Code

A Balanced and Broadly Targeted Computational Linguistics Curriculum

no code implementations • NAACL (TeachingNLP) 2021 • Emma Manning, Nathan Schneider, Amir Zeldes

This paper describes the primarily-graduate computational linguistics and NLP curriculum at Georgetown University, a U. S. university that has seen significant growth in these areas in recent years.

Paper
Add Code

The Making of Coptic Wordnet

no code implementations • GWC 2019 • Laura Slaughter, Luis Morgado Da Costa, So Miyagawa, Marco Büchler, Amir Zeldes, Heike Behlmer

With the increasing availability of wordnets for ancient languages, such as Ancient Greek and Latin, gaps remain in the coverage of less studied languages of antiquity.

Paper
Add Code

Anatomy of OntoGUM—Adapting GUM to the OntoNotes Scheme to Evaluate Robustness of SOTA Coreference Algorithms

no code implementations • CRAC (ACL) 2021 • YIlun Zhu, Sameer Pradhan, Amir Zeldes

SOTA coreference resolution produces increasingly impressive scores on the OntoNotes benchmark.

Anatomy coreference-resolution

Paper
Add Code

The DISRPT 2021 Shared Task on Elementary Discourse Unit Segmentation, Connective Detection, and Relation Classification

no code implementations • EMNLP (DISRPT) 2021 • Amir Zeldes, Yang Janet Liu, Mikel Iruskieta, Philippe Muller, Chloé Braud, Sonia Badene

In 2021, we organized the second iteration of a shared task dedicated to the underlying units used in discourse parsing across formalisms: the DISRPT Shared Task (Discourse Relation Parsing and Treebanking).

Connective Detection Relation +1

Paper
Add Code

CorefUD 1.0: Coreference Meets Universal Dependencies

no code implementations • LREC 2022 • Anna Nedoluzhko, Michal Novák, Martin Popel, Zdeněk Žabokrtský, Amir Zeldes, Daniel Zeman

Recent advances in standardization for annotated language resources have led to successful large scale efforts, such as the Universal Dependencies (UD) project for multilingual syntactically annotated data.

coreference-resolution named-entity-recognition +2

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.