Search Results for author: Alexander Gutkin

Found 16 papers, 7 papers with code

The Taxonomy of Writing Systems: How to Measure How Logographic a System Is

no code implementations CL (ACL) 2021 Richard Sproat, Alexander Gutkin

Our work provides the first quantifiable measure of the notion of logography that accords with linguistic intuition and, we argue, provides better insight into what this notion means.

Mockingbird at the SIGTYP 2022 Shared Task: Two Types of Models forthe Prediction of Cognate Reflexes

no code implementations NAACL (SIGTYP) 2022 Christo Kirov, Richard Sproat, Alexander Gutkin

For reflex generation, the missing reflexes are treated as “masked pixels” in an “image” which is a representation of an entire cognate set across a language family.

Image Restoration

Criteria for Useful Automatic Romanization in South Asian Languages

no code implementations LREC 2022 Isin Demirsahin, Cibu Johny, Alexander Gutkin, Brian Roark

This paper presents a number of possible criteria for systems that transliterate South Asian languages from their native scripts into the Latin script, a process known as romanization.

Extensions to Brahmic script processing within the Nisaba library: new scripts, languages and utilities

no code implementations LREC 2022 Alexander Gutkin, Cibu Johny, Raiomond Doctor, Lawrence Wolf-Sonkin, Brian Roark

The Brahmic family of scripts is used to record some of the most spoken languages in the world and is arguably the most diverse family of writing systems.


Design principles of an open-source language modeling microservice package for AAC text-entry applications

no code implementations SLPAT (ACL) 2022 Brian Roark, Alexander Gutkin

We present MozoLM, an open-source language model microservice package intended for use in AAC text-entry applications, with a particular focus on the design principles of the library.

Language Modelling

Beyond Arabic: Software for Perso-Arabic Script Manipulation

1 code implementation26 Jan 2023 Alexander Gutkin, Cibu Johny, Raiomond Doctor, Brian Roark, Richard Sproat

This paper presents an open-source software library that provides a set of finite-state transducer (FST) components and corresponding utilities for manipulating the writing systems of languages that use the Perso-Arabic script.


Graphemic Normalization of the Perso-Arabic Script

1 code implementation21 Oct 2022 Raiomond Doctor, Alexander Gutkin, Cibu Johny, Brian Roark, Richard Sproat

Since its original appearance in 1991, the Perso-Arabic script representation in Unicode has grown from 169 to over 440 atomic isolated characters spread over several code pages representing standard letters, various diacritics and punctuation for the original Arabic and numerous other regional orthographic traditions.

Language Modelling Machine Translation

Helpful Neighbors: Leveraging Neighbors in Geographic Feature Pronunciation

1 code implementation18 Oct 2022 Llion Jones, Richard Sproat, Haruko Ishikawa, Alexander Gutkin

If one sees the place name Houston Mercer Dog Run in New York, how does one know how to pronounce it?

NEMO: Frequentist Inference Approach to Constrained Linguistic Typology Feature Prediction in SIGTYP 2020 Shared Task

1 code implementation EMNLP (SIGTYP) 2020 Alexander Gutkin, Richard Sproat

This paper describes the NEMO submission to SIGTYP 2020 shared task which deals with prediction of linguistic typological features for multiple languages using the data derived from World Atlas of Language Structures (WALS).


Towards Induction of Structured Phoneme Inventories

no code implementations12 Oct 2020 Alexander Gutkin, Martin Jansche, Lucy Skidmore

This extended abstract surveying the work on phonological typology was prepared for "SIGTYP 2020: The Second Workshop on Computational Research in Linguistic Typology" to be held at EMNLP 2020.

Linguistic Typology Features from Text: Inferring the Sparse Features of World Atlas of Language Structures

no code implementations30 Apr 2020 Alexander Gutkin, Tatiana Merkulova, Martin Jansche

In this paper we investigate whether the various linguistic features from World Atlas of Language Structures (WALS) can be reliably inferred from multi-lingual text.

Multi-Label Classification

Sampling from Stochastic Finite Automata with Applications to CTC Decoding

2 code implementations21 May 2019 Martin Jansche, Alexander Gutkin

We consider the problem of efficient sampling: drawing random string variates from the probability distribution represented by stochastic automata and transformations of those.

Cannot find the paper you are looking for? You can Submit a new open access paper.