1 code implementation • EMNLP 2021 • Taelin Karidi, Yichu Zhou, Nathan Schneider, Omri Abend, Vivek Srikumar
We present a method for exploring regions around individual points in a contextualized vector space (particularly, BERT space), as a way to investigate how these regions correspond to word senses.
no code implementations • NAACL (TeachingNLP) 2021 • Emma Manning, Nathan Schneider, Amir Zeldes
This paper describes the primarily-graduate computational linguistics and NLP curriculum at Georgetown University, a U. S. university that has seen significant growth in these areas in recent years.
no code implementations • NAACL (ACL) 2022 • Tatsuya Aoyama, Nathan Schneider
The current study quantitatively (and qualitatively for an illustrative purpose) analyzes BERT’s layer-wise masked word prediction on an English corpus, and finds that (1) the layerwise localization of linguistic knowledge primarily shown in probing studies is replicated in a behavior-based design and (2) that syntactic and semantic information is encoded at different layers for words of different syntactic categories.
no code implementations • DMR (COLING) 2020 • Jena D. Hwang, Hanwool Choe, Na-Rae Han, Nathan Schneider
While many languages use adpositions to encode semantic relationships between content words in a sentence (e. g., agentivity or temporality), the details of how adpositions work vary widely across languages with respect to both form and meaning.
1 code implementation • EMNLP (LAW, DMR) 2021 • Shira Wein, Nathan Schneider
Translation divergences are varied and widespread, challenging approaches that rely on parallel text.
no code implementations • EMNLP (LAW, DMR) 2021 • Zhuxin Wang, Jakob Prange, Nathan Schneider
Universal Conceptual Cognitive Annotation (UCCA) is a semantic annotation scheme that organizes texts into coarse predicate-argument structure, offering broad coverage of semantic phenomena.
no code implementations • COLING (LAW) 2020 • Jena D. Hwang, Nathan Schneider, Vivek Srikumar
We reevaluate an existing adpositional annotation scheme with respect to two thorny semantic domains: accompaniment and purpose.
1 code implementation • COLING 2022 • Shira Wein, Nathan Schneider
Cross-lingual Abstract Meaning Representation (AMR) parsers are currently evaluated in comparison to gold English AMRs, despite parsing a language other than English, due to the lack of multilingual AMR evaluation metrics.
no code implementations • LREC (LAW) 2022 • Yang Janet Liu, Jena D. Hwang, Nathan Schneider, Vivek Srikumar
The SNACS framework provides a network of semantic labels called supersenses for annotating adpositional semantics in corpora.
1 code implementation • LREC (LAW) 2022 • Shira Wein, Wai Ching Leung, Yifu Mu, Nathan Schneider
In this work, we investigate the similarity of AMR annotations in parallel data and how much the language matters in terms of the graph structure.
no code implementations • LREC 2022 • Luke Gessler, Nathan Schneider, Joseph C. Ledford, Austin Blodgett
We present Xposition, an online platform for documenting adpositional semantics across languages in terms of supersenses (Schneider et al., 2018).
no code implementations • 9 May 2024 • Juri Opitz, Shira Wein, Nathan Schneider
Large Language Models (LLMs) have become capable of generating highly fluent text in certain languages, without modules specially designed to capture grammar or semantic coherence.
1 code implementation • 26 Mar 2024 • Leonie Weissweiler, Nina Böbel, Kirian Guiller, Santiago Herrera, Wesley Scivetti, Arthur Lorenzi, Nurit Melnik, Archna Bhatia, Hinrich Schütze, Lori Levin, Amir Zeldes, Joakim Nivre, William Croft, Nathan Schneider
The Universal Dependencies (UD) project has created an invaluable collection of treebanks with contributions in over 140 languages.
1 code implementation • 1 Nov 2023 • Luke Gessler, Nathan Schneider
A line of work on Transformer-based language models such as BERT has attempted to use syntactic inductive bias to enhance the pretraining process, on the theory that building syntactic structure into the training process should reduce the amount of data needed for training.
1 code implementation • 1 Jun 2023 • Juri Opitz, Shira Wein, Julius Steen, Anette Frank, Nathan Schneider
The task of natural language inference (NLI) asks whether a given premise (expressed in NL) entails a given NL hypothesis.
1 code implementation • 27 May 2023 • Brett Reynolds, Nathan Schneider, Aryaman Arora
CGELBank is a treebank and associated tools based on a syntactic formalism for English derived from the Cambridge Grammar of the English Language.
no code implementations • 24 May 2023 • Michael Kranzlein, Nathan Schneider, Kevin Tobia
Most judicial decisions involve the interpretation of legal texts; as such, judicial opinion requires the use of language as a medium to comment on or draw attention to other language.
1 code implementation • 23 Apr 2023 • Shira Wein, Nathan Schneider
Though individual translated texts are often fluent and preserve meaning, at a large scale, translated texts have statistical tendencies which distinguish them from text originally written in the language ("translationese") and can affect model performance.
no code implementations • 1 Feb 2023 • Amir Zeldes, Nathan Schneider
Recent efforts to consolidate guidelines and treebanks in the Universal Dependencies project raise the expectation that joint training and dataset comparison is increasingly possible for high-resource languages such as English, which have multiple corpora.
no code implementations • 18 Dec 2022 • Shabnam Behzad, Amir Zeldes, Nathan Schneider
In this paper, we present strong baselines for the task of Feedback Comment Generation for Writing Learning.
no code implementations • 6 Oct 2022 • Shira Wein, Zhuxin Wang, Nathan Schneider
Identifying semantically equivalent sentences is important for many cross-lingual and mono-lingual NLP tasks.
1 code implementation • 1 Oct 2022 • Brett Reynolds, Aryaman Arora, Nathan Schneider
We introduce the syntactic formalism of the \textit{Cambridge Grammar of the English Language} (CGEL) to the world of treebanking through the CGELBank project.
no code implementations • LREC 2022 • Aryaman Arora, Nitin Venkateswaran, Nathan Schneider
We present a completed, publicly available corpus of annotated semantic relations of adpositions and case markers in Hindi.
1 code implementation • 1 May 2022 • Shabnam Behzad, Keisuke Sakaguchi, Nathan Schneider, Amir Zeldes
We present ELQA, a corpus of questions and answers in and about the English language.
no code implementations • 15 Apr 2022 • Shira Wein, Lucia Donatelli, Ethan Ricker, Calvin Engstrom, Alex Nelson, Nathan Schneider
The Abstract Meaning Representation (AMR) formalism, designed originally for English, has been adapted to a number of languages.
1 code implementation • NAACL 2022 • Tahira Naseem, Austin Blodgett, Sadhana Kumaravel, Tim O'Gorman, Young-suk Lee, Jeffrey Flanigan, Ramón Fernandez Astudillo, Radu Florian, Salim Roukos, Nathan Schneider
Despite extensive research on parsing of English sentences into Abstraction Meaning Representation (AMR) graphs, which are compared to gold graphs via the Smatch metric, full-document parsing into a unified graph representation lacks well-defined representation and evaluation.
1 code implementation • NAACL 2022 • Jakob Prange, Nathan Schneider, Lingpeng Kong
We examine the extent to which, in principle, linguistic graph representations can complement and improve neural language modeling.
1 code implementation • COLING (LAW) 2020 • Michael Kranzlein, Emma Manning, Siyao Peng, Shira Wein, Aryaman Arora, Bradford Salen, Nathan Schneider
We present the Prepositions Annotated with Supersense Tags in Reddit International English ("PASTRIE") corpus, a new dataset containing manually annotated preposition supersenses of English data from presumed speakers of four L1s: English, French, German, and Spanish.
1 code implementation • 23 Sep 2021 • Taelin Karidi, Yichu Zhou, Nathan Schneider, Omri Abend, Vivek Srikumar
We present a method for exploring regions around individual points in a contextualized vector space (particularly, BERT space), as a way to investigate how these regions correspond to word senses.
2 code implementations • 22 Sep 2021 • Ida Szubert, Omri Abend, Nathan Schneider, Samuel Gibbon, Louis Mahon, Sharon Goldwater, Mark Steedman
We then demonstrate the utility of the compiled corpora through (1) a longitudinal corpus study of the prevalence of different syntactic and semantic phenomena in the CDS, and (2) applying an existing computational model of language acquisition to the two corpora and briefly comparing the results across languages.
1 code implementation • EMNLP (BlackboxNLP) 2021 • Luke Gessler, Nathan Schneider
An important question concerning contextualized word embedding (CWE) models like BERT is how well they can represent different word senses, especially those in the long tail of uncommon senses.
1 code implementation • Findings (EMNLP) 2021 • Michael Kranzlein, Nelson F. Liu, Nathan Schneider
For interpreting the behavior of a probabilistic model, it is useful to measure a model's calibration--the extent to which it produces reliable confidence scores.
no code implementations • UDW (SyntaxFest) 2021 • Nathan Schneider, Amir Zeldes
While the highly multilingual Universal Dependencies (UD) project provides extensive guidelines for clausal structure as well as structure within canonical nominal phrases, a standard treatment is lacking for many "mischievous" nominal phenomena that break the mold.
1 code implementation • ACL 2021 • Austin Blodgett, Nathan Schneider
We present algorithms for aligning components of Abstract Meaning Representation (AMR) graphs to spans in English sentences.
no code implementations • COLING (LAW) 2020 • Luke Gessler, Shira Wein, Nathan Schneider
Prepositional supersense annotation is time-consuming and requires expert training.
no code implementations • 2 Mar 2021 • Aryaman Arora, Nitin Venkateswaran, Nathan Schneider
These are the guidelines for the application of SNACS (Semantic Network of Adposition and Case Supersenses; Schneider et al. 2018) to Modern Standard Hindi of Delhi.
1 code implementation • 31 Dec 2020 • Omri Abend, Nathan Schneider, Dotan Dvir, Jakob Prange, Ari Rappoport
This is the annotation manual for Universal Conceptual Cognitive Annotation (UCCA; Abend and Rappoport, 2013), specifically the Foundational Layer.
1 code implementation • 2 Dec 2020 • Jakob Prange, Nathan Schneider, Vivek Srikumar
Our best tagger is capable of recovering a sizeable fraction of the long-tail supertags and even generates CCG categories that have never been seen in training, while approximating the prior state of the art in overall tag accuracy with fewer parameters.
no code implementations • COLING 2020 • Omri Abend, Dotan Dvir, Daniel Hershcovich, Jakob Prange, Nathan Schneider
This is an introductory tutorial to UCCA (Universal Conceptual Cognitive Annotation), a cross-linguistically applicable framework for semantic representation, with corpora annotated in English, German and French, and ongoing annotation in Russian and Hebrew.
2 code implementations • COLING 2020 • Daniel Hershcovich, Nathan Schneider, Dotan Dvir, Jakob Prange, Miryam de Lhoneux, Omri Abend
Building robust natural language understanding systems will require a clear characterization of whether and how various linguistic meaning representations complement each other.
no code implementations • ACL 2020 • Sean Trott, Tiago Timponi Torrent, Nancy Chang, Nathan Schneider
Human speakers have an extensive toolkit of ways to express themselves.
2 code implementations • ACL (MWE) 2021 • Nelson F. Liu, Daniel Hershcovich, Michael Kranzlein, Nathan Schneider
In lexical semantics, full-sentence segmentation and segment labeling of various phenomena are generally treated separately, despite their interdependence.
Ranked #1 on Natural Language Understanding on STREUSLE
1 code implementation • ACL 2020 • Aryaman Arora, Luke Gessler, Nathan Schneider
Hindi grapheme-to-phoneme (G2P) conversion is mostly trivial, with one exception: whether a schwa represented in the orthography is pronounced or unpronounced (deleted).
no code implementations • COLING 2020 • Emma Manning, Shira Wein, Nathan Schneider
Most current state-of-the art systems for generating English text from Abstract Meaning Representation (AMR) have been evaluated only using automated metrics, such as BLEU, which are known to be problematic for natural language generation.
no code implementations • LREC 2020 • Siyao Peng, Yang Liu, YIlun Zhu, Austin Blodgett, Yushi Zhao, Nathan Schneider
Adpositions are frequent markers of semantic relations, but they are highly ambiguous and vary significantly from language to language.
1 code implementation • CONLL 2019 • Jakob Prange, Nathan Schneider, Omri Abend
Universal Conceptual Cognitive Annotation (UCCA; Abend and Rappoport, 2013) is a typologically-informed, broad-coverage semantic annotation scheme that describes coarse-grained predicate-argument structure but currently lacks semantic roles.
1 code implementation • WS 2019 • Adi Shalev, Jena D. Hwang, Nathan Schneider, Vivek Srikumar, Omri Abend, Ari Rappoport
Research on adpositions and possessives in multiple languages has led to a small inventory of general-purpose meaning classes that disambiguate tokens.
no code implementations • WS 2019 • Jakob Prange, Nathan Schneider, Omri Abend
We propose a coreference annotation scheme as a layer on top of the Universal Conceptual Cognitive Annotation foundational layer, treating units in predicate-argument structure as a basis for entity and event mentions.
no code implementations • WS 2019 • Austin Blodgett, Nathan Schneider
We define new semantics for the CCG combinators that is better suited to deriving AMR graphs.
no code implementations • 6 Dec 2018 • YIlun Zhu, Yang Liu, Siyao Peng, Austin Blodgett, Yushi Zhao, Nathan Schneider
This study adapts Semantic Network of Adposition and Case Supersenses (SNACS) annotation to Mandarin Chinese and demonstrates that the same supersense categories are appropriate for Chinese adposition semantics.
no code implementations • COLING 2018 • Lucia Donatelli, Michael Regan, William Croft, Nathan Schneider
Although English grammar encodes a number of semantic contrasts with tense and aspect marking, these semantics are currently ignored by Abstract Meaning Representation (AMR) annotations.
no code implementations • COLING 2018 • Abigail Walsh, Claire Bonial, Kristina Geeraert, John P. McCrae, Nathan Schneider, Clarissa Somers
This paper describes the construction and annotation of a corpus of verbal MWEs for English, as part of the PARSEME Shared Task 1. 1 on automatic identification of verbal MWEs.
no code implementations • COLING 2018 • Carlos Ramisch, Silvio Ricardo Cordeiro, Agata Savary, Veronika Vincze, Verginica Barbu Mititelu, Archna Bhatia, Maja Buljan, C, Marie ito, Polona Gantar, Voula Giouli, Tunga G{\"u}ng{\"o}r, Abdelati Hawwari, Uxoa I{\~n}urrieta, Jolanta Kovalevskait{\.e}, Simon Krek, Timm Lichte, Chaya Liebeskind, Johanna Monti, Carla Parra Escart{\'\i}n, Behrang Qasemizadeh, Renata Ramisch, Nathan Schneider, Ivelina Stoyanova, Ashwini Vaidya, Abigail Walsh
Corpora were created for 20 languages, which are also briefly discussed.
no code implementations • COLING 2018 • Nathan Schneider
I will describe an unorthodox approach to lexical semantic annotation that prioritizes corpus coverage, democratizing analysis of a wide range of expression types.
no code implementations • ACL 2018 • Hannah Rohde, Alex Johnson, er, Nathan Schneider, Bonnie Webber
Theories of discourse coherence posit relations between discourse segments as a key feature of coherent text.
1 code implementation • NAACL 2018 • Ida Szubert, Adam Lopez, Nathan Schneider
Abstract Meaning Representation (AMR) annotations are often assumed to closely mirror dependency syntax, but AMR explicitly does not require this, and the assumption has never been tested.
1 code implementation • ACL 2018 • Nathan Schneider, Jena D. Hwang, Vivek Srikumar, Jakob Prange, Austin Blodgett, Sarah R. Moeller, Aviram Stern, Adi Bitan, Omri Abend
Semantic relations are often signaled with prepositional or possessive marking--but extreme polysemy bedevils their analysis and automatic interpretation.
Ranked #4 on Natural Language Understanding on STREUSLE (Role F1 (Preps) metric)
1 code implementation • NAACL 2018 • Yijia Liu, Yi Zhu, Wanxiang Che, Bing Qin, Nathan Schneider, Noah A. Smith
Nonetheless, using the new treebank, we build a pipeline system to parse raw tweets into UD.
Ranked #2 on Dependency Parsing on Tweebank
no code implementations • SEMEVAL 2017 • Jena D. Hwang, Archna Bhatia, Na-Rae Han, Tim O{'}Gorman, Vivek Srikumar, Nathan Schneider
We consider the semantics of prepositions, revisiting a broad-coverage annotation scheme used for annotating all 4, 250 preposition tokens in a 55, 000 word corpus of English.
4 code implementations • 7 Apr 2017 • Nathan Schneider, Jena D. Hwang, Vivek Srikumar, Archna Bhatia, Na-Rae Han, Tim O'Gorman, Sarah R. Moeller, Omri Abend, Adi Shalev, Austin Blodgett, Jakob Prange
This document offers a detailed linguistic description of SNACS (Semantic Network of Adposition and Case Supersenses; Schneider et al., 2018), an inventory of 52 semantic labels ("supersenses") that characterize the use of adpositions and case markers at a somewhat coarse level of granularity, as demonstrated in the STREUSLE corpus (https://github. com/nert-nlp/streusle/ ; version 4. 5 tracks guidelines version 2. 6).
no code implementations • EMNLP 2017 • Nathan Schneider, Chuck Wooters
A new Python API, integrated within the NLTK suite, offers access to the FrameNet 1. 7 lexical database.
no code implementations • 10 Mar 2017 • Jena D. Hwang, Archna Bhatia, Na-Rae Han, Tim O'Gorman, Vivek Srikumar, Nathan Schneider
We consider the semantics of prepositions, revisiting a broad-coverage annotation scheme used for annotating all 4, 250 preposition tokens in a 55, 000 word corpus of English.
no code implementations • 8 May 2016 • Nathan Schneider, Jena D. Hwang, Vivek Srikumar, Meredith Green, Kathryn Conger, Tim O'Gorman, Martha Palmer
We present the first corpus annotated with preposition supersenses, unlexicalized categories for semantic functions that can be marked by English prepositions (Schneider et al., 2015).
1 code implementation • LREC 2016 • Nora Hollenstein, Nathan Schneider, Bonnie Webber
Automatically finding these inconsistencies and correcting them (even manually) can increase the quality of the data.
no code implementations • 21 Apr 2015 • Lizhen Qu, Gabriela Ferraro, Liyuan Zhou, Weiwei Hou, Nathan Schneider, Timothy Baldwin
Word embeddings -- distributed word representations that can be learned from unlabelled data -- have been shown to have high utility in many natural language processing applications.
1 code implementation • LREC 2014 • Yulia Tsvetkov, Nathan Schneider, Dirk Hovy, Archna Bhatia, Manaal Faruqui, Chris Dyer
We develop a supersense taxonomy for adjectives, based on that of GermaNet, and apply it to English adjectives in WordNet using human annotation and supervised classification.
no code implementations • LREC 2014 • Nathan Schneider, Spencer Onuffer, Nora Kazour, Emily Danchik, Michael T. Mordowanec, Henrietta Conrad, Noah A. Smith
Multiword expressions (MWEs) are quite frequent in languages such as English, but their diversity, the scarcity of individual MWE types, and contextual ambiguity have presented obstacles to corpus-based studies and NLP systems addressing them as a class.
no code implementations • TACL 2014 • Nathan Schneider, Emily Danchik, Chris Dyer, Noah A. Smith
We present a novel representation, evaluation measure, and supervised models for the task of identifying the multiword expressions (MWEs) in a sentence, resulting in a lexical semantic segmentation.
no code implementations • WS 2013 • Laura Banarescu, Claire Bonial, Shu Cai, Madalina Georgescu, Kira Griffitt, Ulf Hermjakob, Kevin Knight, Philipp Koehn, Martha Palmer, Nathan Schneider
Abstract Meaning Representation Prepositional Phrase Attachment +1
1 code implementation • WS 2013 • Nathan Schneider, Brendan O'Connor, Naomi Saphra, David Bamman, Manaal Faruqui, Noah A. Smith, Chris Dyer, Jason Baldridge
We introduce a framework for lightweight dependency syntax annotation.