Search Results for author: Nathan Schneider

Found 98 papers, 34 papers with code

Putting Words in BERT’s Mouth: Navigating Contextualized Vector Spaces with Pseudowords

1 code implementation EMNLP 2021 Taelin Karidi, Yichu Zhou, Nathan Schneider, Omri Abend, Vivek Srikumar

We present a method for exploring regions around individual points in a contextualized vector space (particularly, BERT space), as a way to investigate how these regions correspond to word senses.

Sentence

A Balanced and Broadly Targeted Computational Linguistics Curriculum

no code implementations NAACL (TeachingNLP) 2021 Emma Manning, Nathan Schneider, Amir Zeldes

This paper describes the primarily-graduate computational linguistics and NLP curriculum at Georgetown University, a U. S. university that has seen significant growth in these areas in recent years.

Probe-Less Probing of BERT’s Layer-Wise Linguistic Knowledge with Masked Word Prediction

no code implementations NAACL (ACL) 2022 Tatsuya Aoyama, Nathan Schneider

The current study quantitatively (and qualitatively for an illustrative purpose) analyzes BERT’s layer-wise masked word prediction on an English corpus, and finds that (1) the layerwise localization of linguistic knowledge primarily shown in probing studies is replicated in a behavior-based design and (2) that syntactic and semantic information is encoded at different layers for words of different syntactic categories.

K-SNACS: Annotating Korean Adposition Semantics

no code implementations DMR (COLING) 2020 Jena D. Hwang, Hanwool Choe, Na-Rae Han, Nathan Schneider

While many languages use adpositions to encode semantic relationships between content words in a sentence (e. g., agentivity or temporality), the details of how adpositions work vary widely across languages with respect to both form and meaning.

Sentence

Subcategorizing Adverbials in Universal Conceptual Cognitive Annotation

no code implementations EMNLP (LAW, DMR) 2021 Zhuxin Wang, Jakob Prange, Nathan Schneider

Universal Conceptual Cognitive Annotation (UCCA) is a semantic annotation scheme that organizes texts into coarse predicate-argument structure, offering broad coverage of semantic phenomena.

Negation

Sprucing up Supersenses: Untangling the Semantic Clusters of Accompaniment and Purpose

no code implementations COLING (LAW) 2020 Jena D. Hwang, Nathan Schneider, Vivek Srikumar

We reevaluate an existing adpositional annotation scheme with respect to two thorny semantic domains: accompaniment and purpose.

Accounting for Language Effect in the Evaluation of Cross-lingual AMR Parsers

1 code implementation COLING 2022 Shira Wein, Nathan Schneider

Cross-lingual Abstract Meaning Representation (AMR) parsers are currently evaluated in comparison to gold English AMRs, despite parsing a language other than English, due to the lack of multilingual AMR evaluation metrics.

Abstract Meaning Representation Sentence

Putting Context in SNACS: A 5-Way Classification of Adpositional Pragmatic Markers

no code implementations LREC (LAW) 2022 Yang Janet Liu, Jena D. Hwang, Nathan Schneider, Vivek Srikumar

The SNACS framework provides a network of semantic labels called supersenses for annotating adpositional semantics in corpora.

Effect of Source Language on AMR Structure

1 code implementation LREC (LAW) 2022 Shira Wein, Wai Ching Leung, Yifu Mu, Nathan Schneider

In this work, we investigate the similarity of AMR annotations in parallel data and how much the language matters in terms of the graph structure.

Abstract Meaning Representation AMR Parsing +1

Xposition: An Online Multilingual Database of Adpositional Semantics

no code implementations LREC 2022 Luke Gessler, Nathan Schneider, Joseph C. Ledford, Austin Blodgett

We present Xposition, an online platform for documenting adpositional semantics across languages in terms of supersenses (Schneider et al., 2018).

Natural Language Processing RELIES on Linguistics

no code implementations9 May 2024 Juri Opitz, Shira Wein, Nathan Schneider

Large Language Models (LLMs) have become capable of generating highly fluent text in certain languages, without modules specially designed to capture grammar or semantic coherence.

Syntactic Inductive Bias in Transformer Language Models: Especially Helpful for Low-Resource Languages?

1 code implementation1 Nov 2023 Luke Gessler, Nathan Schneider

A line of work on Transformer-based language models such as BERT has attempted to use syntactic inductive bias to enhance the pretraining process, on the theory that building syntactic structure into the training process should reduce the amount of data needed for training.

Inductive Bias

AMR4NLI: Interpretable and robust NLI measures from semantic graphs

1 code implementation1 Jun 2023 Juri Opitz, Shira Wein, Julius Steen, Anette Frank, Nathan Schneider

The task of natural language inference (NLI) asks whether a given premise (expressed in NL) entails a given NL hypothesis.

Natural Language Inference Sentence

CGELBank Annotation Manual v1.1

1 code implementation27 May 2023 Brett Reynolds, Nathan Schneider, Aryaman Arora

CGELBank is a treebank and associated tools based on a syntactic formalism for English derived from the Cambridge Grammar of the English Language.

CuRIAM: Corpus re Interpretation and Metalanguage in U.S. Supreme Court Opinions

no code implementations24 May 2023 Michael Kranzlein, Nathan Schneider, Kevin Tobia

Most judicial decisions involve the interpretation of legal texts; as such, judicial opinion requires the use of language as a medium to comment on or draw attention to other language.

Lost in Translationese? Reducing Translation Effect Using Abstract Meaning Representation

1 code implementation23 Apr 2023 Shira Wein, Nathan Schneider

Though individual translated texts are often fluent and preserve meaning, at a large scale, translated texts have statistical tendencies which distinguish them from text originally written in the language ("translationese") and can affect model performance.

Abstract Meaning Representation Machine Translation +2

Are UD Treebanks Getting More Consistent? A Report Card for English UD

no code implementations1 Feb 2023 Amir Zeldes, Nathan Schneider

Recent efforts to consolidate guidelines and treebanks in the Universal Dependencies project raise the expectation that joint training and dataset comparison is increasingly possible for high-resource languages such as English, which have multiple corpora.

CGELBank: CGEL as a Framework for English Syntax Annotation

1 code implementation1 Oct 2022 Brett Reynolds, Aryaman Arora, Nathan Schneider

We introduce the syntactic formalism of the \textit{Cambridge Grammar of the English Language} (CGEL) to the world of treebanking through the CGELBank project.

MASALA: Modelling and Analysing the Semantics of Adpositions in Linguistic Annotation of Hindi

no code implementations LREC 2022 Aryaman Arora, Nitin Venkateswaran, Nathan Schneider

We present a completed, publicly available corpus of annotated semantic relations of adpositions and case markers in Hindi.

Spanish Abstract Meaning Representation: Annotation of a General Corpus

no code implementations15 Apr 2022 Shira Wein, Lucia Donatelli, Ethan Ricker, Calvin Engstrom, Alex Nelson, Nathan Schneider

The Abstract Meaning Representation (AMR) formalism, designed originally for English, has been adapted to a number of languages.

Abstract Meaning Representation AMR Parsing

DocAMR: Multi-Sentence AMR Representation and Evaluation

1 code implementation NAACL 2022 Tahira Naseem, Austin Blodgett, Sadhana Kumaravel, Tim O'Gorman, Young-suk Lee, Jeffrey Flanigan, Ramón Fernandez Astudillo, Radu Florian, Salim Roukos, Nathan Schneider

Despite extensive research on parsing of English sentences into Abstraction Meaning Representation (AMR) graphs, which are compared to gold graphs via the Smatch metric, full-document parsing into a unified graph representation lacks well-defined representation and evaluation.

coreference-resolution Sentence

Linguistic Frameworks Go Toe-to-Toe at Neuro-Symbolic Language Modeling

1 code implementation NAACL 2022 Jakob Prange, Nathan Schneider, Lingpeng Kong

We examine the extent to which, in principle, linguistic graph representations can complement and improve neural language modeling.

Language Modelling

PASTRIE: A Corpus of Prepositions Annotated with Supersense Tags in Reddit International English

1 code implementation COLING (LAW) 2020 Michael Kranzlein, Emma Manning, Siyao Peng, Shira Wein, Aryaman Arora, Bradford Salen, Nathan Schneider

We present the Prepositions Annotated with Supersense Tags in Reddit International English ("PASTRIE") corpus, a new dataset containing manually annotated preposition supersenses of English data from presumed speakers of four L1s: English, French, German, and Spanish.

Putting Words in BERT's Mouth: Navigating Contextualized Vector Spaces with Pseudowords

1 code implementation23 Sep 2021 Taelin Karidi, Yichu Zhou, Nathan Schneider, Omri Abend, Vivek Srikumar

We present a method for exploring regions around individual points in a contextualized vector space (particularly, BERT space), as a way to investigate how these regions correspond to word senses.

Sentence

Cross-linguistically Consistent Semantic and Syntactic Annotation of Child-directed Speech

2 code implementations22 Sep 2021 Ida Szubert, Omri Abend, Nathan Schneider, Samuel Gibbon, Louis Mahon, Sharon Goldwater, Mark Steedman

We then demonstrate the utility of the compiled corpora through (1) a longitudinal corpus study of the prevalence of different syntactic and semantic phenomena in the CDS, and (2) applying an existing computational model of language acquisition to the two corpora and briefly comparing the results across languages.

Language Acquisition Semantic Parsing

BERT Has Uncommon Sense: Similarity Ranking for Word Sense BERTology

1 code implementation EMNLP (BlackboxNLP) 2021 Luke Gessler, Nathan Schneider

An important question concerning contextualized word embedding (CWE) models like BERT is how well they can represent different word senses, especially those in the long tail of uncommon senses.

Retrieval

Making Heads and Tails of Models with Marginal Calibration for Sparse Tagsets

1 code implementation Findings (EMNLP) 2021 Michael Kranzlein, Nelson F. Liu, Nathan Schneider

For interpreting the behavior of a probabilistic model, it is useful to measure a model's calibration--the extent to which it produces reliable confidence scores.

TAG

Mischievous Nominal Constructions in Universal Dependencies

no code implementations UDW (SyntaxFest) 2021 Nathan Schneider, Amir Zeldes

While the highly multilingual Universal Dependencies (UD) project provides extensive guidelines for clausal structure as well as structure within canonical nominal phrases, a standard treatment is lacking for many "mischievous" nominal phenomena that break the mold.

Hindi-Urdu Adposition and Case Supersenses v1.0

no code implementations2 Mar 2021 Aryaman Arora, Nitin Venkateswaran, Nathan Schneider

These are the guidelines for the application of SNACS (Semantic Network of Adposition and Case Supersenses; Schneider et al. 2018) to Modern Standard Hindi of Delhi.

UCCA's Foundational Layer: Annotation Guidelines v2.1

1 code implementation31 Dec 2020 Omri Abend, Nathan Schneider, Dotan Dvir, Jakob Prange, Ari Rappoport

This is the annotation manual for Universal Conceptual Cognitive Annotation (UCCA; Abend and Rappoport, 2013), specifically the Foundational Layer.

Supertagging the Long Tail with Tree-Structured Decoding of Complex Categories

1 code implementation2 Dec 2020 Jakob Prange, Nathan Schneider, Vivek Srikumar

Our best tagger is capable of recovering a sizeable fraction of the long-tail supertags and even generates CCG categories that have never been seen in training, while approximating the prior state of the art in overall tag accuracy with fewer parameters.

Structured Prediction TAG

Cross-lingual Semantic Representation for NLP with UCCA

no code implementations COLING 2020 Omri Abend, Dotan Dvir, Daniel Hershcovich, Jakob Prange, Nathan Schneider

This is an introductory tutorial to UCCA (Universal Conceptual Cognitive Annotation), a cross-linguistically applicable framework for semantic representation, with corpora annotated in English, German and French, and ongoing annotation in Russian and Hebrew.

Philosophy UCCA Parsing

Comparison by Conversion: Reverse-Engineering UCCA from Syntax and Lexical Semantics

2 code implementations COLING 2020 Daniel Hershcovich, Nathan Schneider, Dotan Dvir, Jakob Prange, Miryam de Lhoneux, Omri Abend

Building robust natural language understanding systems will require a clear characterization of whether and how various linguistic meaning representations complement each other.

Natural Language Understanding Sentence

Lexical Semantic Recognition

2 code implementations ACL (MWE) 2021 Nelson F. Liu, Daniel Hershcovich, Michael Kranzlein, Nathan Schneider

In lexical semantics, full-sentence segmentation and segment labeling of various phenomena are generally treated separately, despite their interdependence.

Natural Language Understanding Sentence +1

Supervised Grapheme-to-Phoneme Conversion of Orthographic Schwas in Hindi and Punjabi

1 code implementation ACL 2020 Aryaman Arora, Luke Gessler, Nathan Schneider

Hindi grapheme-to-phoneme (G2P) conversion is mostly trivial, with one exception: whether a schwa represented in the orthography is pronounced or unpronounced (deleted).

A Human Evaluation of AMR-to-English Generation Systems

no code implementations COLING 2020 Emma Manning, Shira Wein, Nathan Schneider

Most current state-of-the art systems for generating English text from Abstract Meaning Representation (AMR) have been evaluated only using automated metrics, such as BLEU, which are known to be problematic for natural language generation.

Abstract Meaning Representation Text Generation

A Corpus of Adpositional Supersenses for Mandarin Chinese

no code implementations LREC 2020 Siyao Peng, Yang Liu, YIlun Zhu, Austin Blodgett, Yushi Zhao, Nathan Schneider

Adpositions are frequent markers of semantic relations, but they are highly ambiguous and vary significantly from language to language.

Translation

Made for Each Other: Broad-coverage Semantic Structures Meet Preposition Supersenses

1 code implementation CONLL 2019 Jakob Prange, Nathan Schneider, Omri Abend

Universal Conceptual Cognitive Annotation (UCCA; Abend and Rappoport, 2013) is a typologically-informed, broad-coverage semantic annotation scheme that describes coarse-grained predicate-argument structure but currently lacks semantic roles.

Preparing SNACS for Subjects and Objects

1 code implementation WS 2019 Adi Shalev, Jena D. Hwang, Nathan Schneider, Vivek Srikumar, Omri Abend, Ari Rappoport

Research on adpositions and possessives in multiple languages has led to a small inventory of general-purpose meaning classes that disambiguate tokens.

Semantically Constrained Multilayer Annotation: The Case of Coreference

no code implementations WS 2019 Jakob Prange, Nathan Schneider, Omri Abend

We propose a coreference annotation scheme as a layer on top of the Universal Conceptual Cognitive Annotation foundational layer, treating units in predicate-argument structure as a basis for entity and event mentions.

Adpositional Supersenses for Mandarin Chinese

no code implementations6 Dec 2018 YIlun Zhu, Yang Liu, Siyao Peng, Austin Blodgett, Yushi Zhao, Nathan Schneider

This study adapts Semantic Network of Adposition and Case Supersenses (SNACS) annotation to Mandarin Chinese and demonstrates that the same supersense categories are appropriate for Chinese adposition semantics.

Machine Translation Translation

Annotation of Tense and Aspect Semantics for Sentential AMR

no code implementations COLING 2018 Lucia Donatelli, Michael Regan, William Croft, Nathan Schneider

Although English grammar encodes a number of semantic contrasts with tense and aspect marking, these semantics are currently ignored by Abstract Meaning Representation (AMR) annotations.

Abstract Meaning Representation Entity Typing +1

Constructing an Annotated Corpus of Verbal MWEs for English

no code implementations COLING 2018 Abigail Walsh, Claire Bonial, Kristina Geeraert, John P. McCrae, Nathan Schneider, Clarissa Somers

This paper describes the construction and annotation of a corpus of verbal MWEs for English, as part of the PARSEME Shared Task 1. 1 on automatic identification of verbal MWEs.

Word Alignment

Leaving no token behind: comprehensive (and delicious) annotation of MWEs and supersenses

no code implementations COLING 2018 Nathan Schneider

I will describe an unorthodox approach to lexical semantic annotation that prioritizes corpus coverage, democratizing analysis of a wide range of expression types.

A Structured Syntax-Semantics Interface for English-AMR Alignment

1 code implementation NAACL 2018 Ida Szubert, Adam Lopez, Nathan Schneider

Abstract Meaning Representation (AMR) annotations are often assumed to closely mirror dependency syntax, but AMR explicitly does not require this, and the assumption has never been tested.

Abstract Meaning Representation AMR Parsing

Comprehensive Supersense Disambiguation of English Prepositions and Possessives

1 code implementation ACL 2018 Nathan Schneider, Jena D. Hwang, Vivek Srikumar, Jakob Prange, Austin Blodgett, Sarah R. Moeller, Aviram Stern, Adi Bitan, Omri Abend

Semantic relations are often signaled with prepositional or possessive marking--but extreme polysemy bedevils their analysis and automatic interpretation.

Ranked #4 on Natural Language Understanding on STREUSLE (Role F1 (Preps) metric)

Natural Language Understanding

Double Trouble: The Problem of Construal in Semantic Annotation of Adpositions

no code implementations SEMEVAL 2017 Jena D. Hwang, Archna Bhatia, Na-Rae Han, Tim O{'}Gorman, Vivek Srikumar, Nathan Schneider

We consider the semantics of prepositions, revisiting a broad-coverage annotation scheme used for annotating all 4, 250 preposition tokens in a 55, 000 word corpus of English.

Adposition and Case Supersenses v2.6: Guidelines for English

4 code implementations7 Apr 2017 Nathan Schneider, Jena D. Hwang, Vivek Srikumar, Archna Bhatia, Na-Rae Han, Tim O'Gorman, Sarah R. Moeller, Omri Abend, Adi Shalev, Austin Blodgett, Jakob Prange

This document offers a detailed linguistic description of SNACS (Semantic Network of Adposition and Case Supersenses; Schneider et al., 2018), an inventory of 52 semantic labels ("supersenses") that characterize the use of adpositions and case markers at a somewhat coarse level of granularity, as demonstrated in the STREUSLE corpus (https://github. com/nert-nlp/streusle/ ; version 4. 5 tracks guidelines version 2. 6).

The NLTK FrameNet API: Designing for Discoverability with a Rich Linguistic Resource

no code implementations EMNLP 2017 Nathan Schneider, Chuck Wooters

A new Python API, integrated within the NLTK suite, offers access to the FrameNet 1. 7 lexical database.

Coping with Construals in Broad-Coverage Semantic Annotation of Adpositions

no code implementations10 Mar 2017 Jena D. Hwang, Archna Bhatia, Na-Rae Han, Tim O'Gorman, Vivek Srikumar, Nathan Schneider

We consider the semantics of prepositions, revisiting a broad-coverage annotation scheme used for annotating all 4, 250 preposition tokens in a 55, 000 word corpus of English.

A corpus of preposition supersenses in English web reviews

no code implementations8 May 2016 Nathan Schneider, Jena D. Hwang, Vivek Srikumar, Meredith Green, Kathryn Conger, Tim O'Gorman, Martha Palmer

We present the first corpus annotated with preposition supersenses, unlexicalized categories for semantic functions that can be marked by English prepositions (Schneider et al., 2015).

Inconsistency Detection in Semantic Annotation

1 code implementation LREC 2016 Nora Hollenstein, Nathan Schneider, Bonnie Webber

Automatically finding these inconsistencies and correcting them (even manually) can increase the quality of the data.

Big Data Small Data, In Domain Out-of Domain, Known Word Unknown Word: The Impact of Word Representation on Sequence Labelling Tasks

no code implementations21 Apr 2015 Lizhen Qu, Gabriela Ferraro, Liyuan Zhou, Weiwei Hou, Nathan Schneider, Timothy Baldwin

Word embeddings -- distributed word representations that can be learned from unlabelled data -- have been shown to have high utility in many natural language processing applications.

Chunking NER +4

Augmenting English Adjective Senses with Supersenses

1 code implementation LREC 2014 Yulia Tsvetkov, Nathan Schneider, Dirk Hovy, Archna Bhatia, Manaal Faruqui, Chris Dyer

We develop a supersense taxonomy for adjectives, based on that of GermaNet, and apply it to English adjectives in WordNet using human annotation and supervised classification.

Classification General Classification

Comprehensive Annotation of Multiword Expressions in a Social Web Corpus

no code implementations LREC 2014 Nathan Schneider, Spencer Onuffer, Nora Kazour, Emily Danchik, Michael T. Mordowanec, Henrietta Conrad, Noah A. Smith

Multiword expressions (MWEs) are quite frequent in languages such as English, but their diversity, the scarcity of individual MWE types, and contextual ambiguity have presented obstacles to corpus-based studies and NLP systems addressing them as a class.

Diversity Language Acquisition +2

Discriminative Lexical Semantic Segmentation with Gaps: Running the MWE Gamut

no code implementations TACL 2014 Nathan Schneider, Emily Danchik, Chris Dyer, Noah A. Smith

We present a novel representation, evaluation measure, and supervised models for the task of identifying the multiword expressions (MWEs) in a sentence, resulting in a lexical semantic segmentation.

Chunking Segmentation +2

Cannot find the paper you are looking for? You can Submit a new open access paper.