no code implementations • LREC 2022 • Alexander Gutkin, Cibu Johny, Raiomond Doctor, Lawrence Wolf-Sonkin, Brian Roark
The Brahmic family of scripts is used to record some of the most spoken languages in the world and is arguably the most diverse family of writing systems.
no code implementations • EACL 2021 • Cibu Johny, Lawrence Wolf-Sonkin, Alexander Gutkin, Brian Roark
This paper presents an open-source library for efficient low-level processing of ten major South Asian Brahmic scripts.
1 code implementation • LREC 2020 • Brian Roark, Lawrence Wolf-Sonkin, Christo Kirov, Sabrina J. Mielke, Cibu Johny, Isin Demirsahin, Keith Hall
This paper describes the Dakshina dataset, a new resource consisting of text in both the Latin and native scripts for 12 South Asian languages.
no code implementations • 3 May 2020 • Adina Williams, Ryan Cotterell, Lawrence Wolf-Sonkin, Damián Blasi, Hanna Wallach
We also find that there are statistically significant relationships between the grammatical genders of inanimate nouns and the verbs that take those nouns as direct objects, as indirect objects, and as subjects.
no code implementations • IJCNLP 2019 • Adina Williams, Ryan Cotterell, Lawrence Wolf-Sonkin, Damián Blasi, Hanna Wallach
To that end, we use canonical correlation analysis to correlate the grammatical gender of inanimate nouns with an externally grounded definition of their lexical semantics.
no code implementations • WS 2019 • Arya D. McCarthy, Ekaterina Vylomova, Shijie Wu, Chaitanya Malaviya, Lawrence Wolf-Sonkin, Garrett Nicolai, Christo Kirov, Miikka Silfverberg, Sabrina J. Mielke, Jeffrey Heinz, Ryan Cotterell, Mans Hulden
The SIGMORPHON 2019 shared task on cross-lingual transfer and contextual analysis in morphology examined transfer learning of inflection between 100 language pairs, as well as contextual lemmatization and morphosyntactic description in 66 languages.
no code implementations • WS 2019 • Lawrence Wolf-Sonkin, Vlad Schogol, Brian Roark, Michael Riley
The use of the Latin script for text entry of South Asian languages is common, even though there is no standard orthography for these languages in the script.
no code implementations • ACL 2019 • Damian Blasi, Ryan Cotterell, Lawrence Wolf-Sonkin, Sabine Stoll, Balthasar Bickel, Marco Baroni
Embedding a clause inside another ({``}the girl [who likes cars [that run fast]] has arrived{''}) is a fundamental resource that has been argued to be a key driver of linguistic expressiveness.
1 code implementation • NAACL 2019 • Alexander Hoyle, Lawrence Wolf-Sonkin, Hanna Wallach, Ryan Cotterell, Isabelle Augenstein
When assigning quantitative labels to a dataset, different methodologies may rely on different scales.
2 code implementations • ACL 2018 • Lawrence Wolf-Sonkin, Jason Naradowsky, Sabrina J. Mielke, Ryan Cotterell
Statistical morphological inflectors are typically trained on fully supervised, type-level data.