no code implementations • 11 Jul 2023 • Seth Kulick, Neville Ryant, David J. Irwin, Naomi Nevler, Sunghye Cho
This paper addresses the problem of improving POS tagging of transcripts of speech from clinical populations.
2 code implementations • 3 Apr 2022 • Seth Kulick, Neville Ryant, Beatrice Santorini, Joel Wallenberg, Assaf Urieli
We describe the construction and evaluation of a part-of-speech tagger for Yiddish.
no code implementations • Findings (NAACL) 2022 • Seth Kulick, Neville Ryant, Beatrice Santorini
We present the first parsing results on the Penn-Helsinki Parsed Corpus of Early Modern English (PPCEME), a 1. 9 million word treebank that is an important resource for research in syntactic change.
no code implementations • SCiL 2022 • Seth Kulick, Neville Ryant
We investigate the question of whether advances in NLP over the last few years make it possible to vastly increase the size of data usable for research in historical syntax.
no code implementations • LREC 2016 • Seth Kulick, Ann Bies
The Low Resource Language research conducted under DARPA{'}s Broad Operational Language Translation (BOLT) program required the rapid creation of text corpora of typologically diverse languages (Turkish, Hausa, and Uzbek) which were annotated with morphological information, along with other types of annotation.
no code implementations • LREC 2014 • Mohamed Maamouri, Ann Bies, Seth Kulick, Michael Ciul, Nizar Habash, Esk, Ramy er
This paper describes the parallel development of an Egyptian Arabic Treebank and a morphological analyzer for Egyptian Arabic (CALIMA).
no code implementations • LREC 2014 • Ann Bies, Justin Mott, Seth Kulick, Jennifer Garland, Colin Warner
New annotation guidelines and new processing methods were developed to accommodate English treebank annotation of a parallel English/Chinese corpus of web data that includes alternate English translations (one fluent, one literal) of expressions that are idiomatic in the Chinese source.
no code implementations • LREC 2012 • Mohamed Maamouri, Ann Bies, Seth Kulick
Because news broadcasts are predominantly scripted, most of the transcribed speech is in Modern Standard Arabic (MSA).
no code implementations • LREC 2012 • Seth Kulick, Ann Bies, Justin Mott
Annotation of word sequences are compared both for their internal structural consistency, and their external relation to the rest of the tree.