no code implementations • CL (ACL) 2021 • Lifeng Jin, Lane Schwartz, Finale Doshi-Velez, Timothy Miller, William Schuler
Abstract This article describes a simple PCFG induction model with a fixed category domain that predicts a large majority of attested constituent boundaries, and predicts labels consistent with nearly half of attested constituent labels on a standard evaluation data set of child-directed speech.
no code implementations • ACL 2022 • Lane Schwartz
In this paper, we challenge the ACL community to reckon with historical and ongoing colonialism by adopting a set of ethical obligations and best practices drawn from the Indigenous studies literature.
no code implementations • FieldMatters (COLING) 2022 • Lane Schwartz, Coleman Haley, Francis Tyers
In this paper, we present a straightforward technique for constructing interpretable word embeddings from morphologically analyzed examples (such as interlinear glosses) for all of the world’s languages.
no code implementations • WMT (EMNLP) 2021 • Giang Le, Shinka Mori, Lane Schwartz
This system paper describes an end-to-end NMT pipeline for the Japanese \leftrightarrow English news translation task as submitted to WMT 2021, where we explore the efficacy of techniques such as tokenizing with language-independent and language-dependent tokenizers, normalizing by orthographic conversion, creating a politeness-and-formality-aware model by implementing a tagger, back-translation, model ensembling, and n-best reranking.
no code implementations • NAACL (AmericasNLP) 2021 • Hyunji Park, Lane Schwartz, Francis Tyers
This paper describes the development of the first Universal Dependencies (UD) treebank for St. Lawrence Island Yupik, an endangered language spoken in the Bering Strait region.
no code implementations • 26 Jan 2021 • Lane Schwartz, Emily Chen, Hyunji Hayley Park, Edward Jahn, Sylvia L. R. Schreiner
St. Lawrence Island Yupik (ISO 639-3: ess) is an endangered polysynthetic language in the Inuit-Yupik language family indigenous to Alaska and Chukotka.
1 code implementation • 11 Dec 2020 • Hyunji Hayley Park, Katherine J. Zhang, Coleman Haley, Kenneth Steimel, Han Liu, Lane Schwartz
We fill in missing typological data for several languages and consider corpus-based measures of morphological complexity in addition to expert-produced typological features.
no code implementations • 11 May 2020 • Lane Schwartz, Francis Tyers, Lori Levin, Christo Kirov, Patrick Littell, Chi-kiu Lo, Emily Prud'hommeaux, Hyunji Hayley Park, Kenneth Steimel, Rebecca Knowles, Jeffrey Micher, Lonny Strunk, Han Liu, Coleman Haley, Katherine J. Zhang, Robbie Jimmerson, Vasilisa Andriyanets, Aldrian Obaja Muis, Naoki Otani, Jong Hyuk Park, Zhisong Zhang
In the literature, languages like Finnish or Turkish are held up as extreme examples of complexity that challenge common modelling assumptions.
no code implementations • LREC 2020 • Emily Chen, Hyunji Hayley Park, Lane Schwartz
In this work, we present a re-implementation of the Chen {\&} Schwartz (2018) finite-state morphological analyzer for St. Lawrence Island Yupik that incorporates new linguistic insights; in particular, in this implementation we make use of the Paradigm Function Morphology (PFM) theory of morphology.
no code implementations • ACL 2019 • Lifeng Jin, Finale Doshi-Velez, Timothy Miller, Lane Schwartz, William Schuler
This paper describes a neural PCFG inducer which employs context embeddings (Peters et al., 2018) in a normalizing flow model (Dinh et al., 2015) to extend PCFG induction to use semantic and morphological information.
no code implementations • NAACL 2019 • Benjamin Hunt, Emily Chen, Sylvia L.R. Schreiner, Lane Schwartz
If a user searches for an inflected Yupik word form, we perform a morphological analysis and return entries for the root word and for any derivational suffixes present in the word.
1 code implementation • EMNLP 2018 • Lifeng Jin, Finale Doshi-Velez, Timothy Miller, William Schuler, Lane Schwartz
There have been several recent attempts to improve the accuracy of grammar induction systems by bounding the recursive complexity of the induction model (Ponvert et al., 2011; Noji and Johnson, 2016; Shain et al., 2016; Jin et al., 2018).
1 code implementation • TACL 2018 • Lifeng Jin, Finale Doshi-Velez, Timothy Miller, William Schuler, Lane Schwartz
There has been recent interest in applying cognitively or empirically motivated bounds on recursion depth to limit the search space of grammar induction models (Ponvert et al., 2011; Noji and Johnson, 2016; Shain et al., 2016).
1 code implementation • ICLR 2018 • Richard Wei, Lane Schwartz, Vikram Adve
Deep learning software demands reliability and performance.
no code implementations • COLING 2016 • Cory Shain, William Bryce, Lifeng Jin, Victoria Krakovna, Finale Doshi-Velez, Timothy Miller, William Schuler, Lane Schwartz
This paper presents a new memory-bounded left-corner parsing model for unsupervised raw-text syntax induction, using unsupervised hierarchical hidden Markov models (UHHMM).
no code implementations • AMTA 2016 • Hieu Hoang, Nikolay Bogoychev, Lane Schwartz, Marcin Junczys-Dowmunt
The utilization of statistical machine translation (SMT) has grown enormously over the last decade, many using open-source software developed by the NLP community.