Search Results for author: Lane Schwartz

Found 22 papers, 4 papers with code

Primum Non Nocere: Before working with Indigenous data, the ACL must confront ongoing colonialism

no code implementations ACL 2022 Lane Schwartz

In this paper, we challenge the ACL community to reckon with historical and ongoing colonialism by adopting a set of ethical obligations and best practices drawn from the Indigenous studies literature.

Expanding Universal Dependencies for Polysynthetic Languages: A Case of St. Lawrence Island Yupik

no code implementations NAACL (AmericasNLP) 2021 Hyunji Park, Lane Schwartz, Francis Tyers

This paper describes the development of the first Universal Dependencies (UD) treebank for St. Lawrence Island Yupik, an endangered language spoken in the Bering Strait region.

Dependency Parsing

Depth-Bounded Statistical PCFG Induction as a Model of Human Grammar Acquisition

no code implementations CL (ACL) 2021 Lifeng Jin, Lane Schwartz, Finale Doshi-Velez, Timothy Miller, William Schuler

Abstract This article describes a simple PCFG induction model with a fixed category domain that predicts a large majority of attested constituent boundaries, and predicts labels consistent with nearly half of attested constituent labels on a standard evaluation data set of child-directed speech.

How to encode arbitrarily complex morphology in word embeddings, no corpus needed

no code implementations FieldMatters (COLING) 2022 Lane Schwartz, Coleman Haley, Francis Tyers

In this paper, we present a straightforward technique for constructing interpretable word embeddings from morphologically analyzed examples (such as interlinear glosses) for all of the world’s languages.

Word Embeddings

Illinois Japanese \leftrightarrow English News Translation for WMT 2021

no code implementations WMT (EMNLP) 2021 Giang Le, Shinka Mori, Lane Schwartz

This system paper describes an end-to-end NMT pipeline for the Japanese \leftrightarrow English news translation task as submitted to WMT 2021, where we explore the efficacy of techniques such as tokenizing with language-independent and language-dependent tokenizers, normalizing by orthographic conversion, creating a politeness-and-formality-aware model by implementing a tagger, back-translation, model ensembling, and n-best reranking.

NMT Translation

A Digital Corpus of St. Lawrence Island Yupik

no code implementations26 Jan 2021 Lane Schwartz, Emily Chen, Hyunji Hayley Park, Edward Jahn, Sylvia L. R. Schreiner

St. Lawrence Island Yupik (ISO 639-3: ess) is an endangered polysynthetic language in the Inuit-Yupik language family indigenous to Alaska and Chukotka.

Morphology Matters: A Multilingual Language Modeling Analysis

1 code implementation11 Dec 2020 Hyunji Hayley Park, Katherine J. Zhang, Coleman Haley, Kenneth Steimel, Han Liu, Lane Schwartz

We fill in missing typological data for several languages and consider corpus-based measures of morphological complexity in addition to expert-produced typological features.

Language Modelling Segmentation

Improved Finite-State Morphological Analysis for St. Lawrence Island Yupik Using Paradigm Function Morphology

no code implementations LREC 2020 Emily Chen, Hyunji Hayley Park, Lane Schwartz

In this work, we present a re-implementation of the Chen {\&} Schwartz (2018) finite-state morphological analyzer for St. Lawrence Island Yupik that incorporates new linguistic insights; in particular, in this implementation we make use of the Paradigm Function Morphology (PFM) theory of morphology.

Morphological Analysis

Unsupervised Learning of PCFGs with Normalizing Flow

no code implementations ACL 2019 Lifeng Jin, Finale Doshi-Velez, Timothy Miller, Lane Schwartz, William Schuler

This paper describes a neural PCFG inducer which employs context embeddings (Peters et al., 2018) in a normalizing flow model (Dinh et al., 2015) to extend PCFG induction to use semantic and morphological information.

Language Acquisition

Community lexical access for an endangered polysynthetic language: An electronic dictionary for St. Lawrence Island Yupik

no code implementations NAACL 2019 Benjamin Hunt, Emily Chen, Sylvia L.R. Schreiner, Lane Schwartz

If a user searches for an inflected Yupik word form, we perform a morphological analysis and return entries for the root word and for any derivational suffixes present in the word.

Morphological Analysis

Depth-bounding is effective: Improvements and evaluation of unsupervised PCFG induction

1 code implementation EMNLP 2018 Lifeng Jin, Finale Doshi-Velez, Timothy Miller, William Schuler, Lane Schwartz

There have been several recent attempts to improve the accuracy of grammar induction systems by bounding the recursive complexity of the induction model (Ponvert et al., 2011; Noji and Johnson, 2016; Shain et al., 2016; Jin et al., 2018).

Unsupervised Grammar Induction with Depth-bounded PCFG

1 code implementation TACL 2018 Lifeng Jin, Finale Doshi-Velez, Timothy Miller, William Schuler, Lane Schwartz

There has been recent interest in applying cognitively or empirically motivated bounds on recursion depth to limit the search space of grammar induction models (Ponvert et al., 2011; Noji and Johnson, 2016; Shain et al., 2016).

Memory-Bounded Left-Corner Unsupervised Grammar Induction on Child-Directed Input

no code implementations COLING 2016 Cory Shain, William Bryce, Lifeng Jin, Victoria Krakovna, Finale Doshi-Velez, Timothy Miller, William Schuler, Lane Schwartz

This paper presents a new memory-bounded left-corner parsing model for unsupervised raw-text syntax induction, using unsupervised hierarchical hidden Markov models (UHHMM).

Language Acquisition Sentence

Fast, Scalable Phrase-Based SMT Decoding

no code implementations AMTA 2016 Hieu Hoang, Nikolay Bogoychev, Lane Schwartz, Marcin Junczys-Dowmunt

The utilization of statistical machine translation (SMT) has grown enormously over the last decade, many using open-source software developed by the NLP community.

Machine Translation Translation

Cannot find the paper you are looking for? You can Submit a new open access paper.