1 code implementation • 14 Oct 2020 • Alena Butryna, Shan-Hui Cathy Chu, Isin Demirsahin, Alexander Gutkin, Linne Ha, Fei He, Martin Jansche, Cibu Johny, Anna Katanova, Oddur Kjartansson, Chenfang Li, Tatiana Merkulova, Yin May Oo, Knot Pipatsrisawat, Clara Rivera, Supheakmungkol Sarin, Pasindu De Silva, Keshan Sodimana, Richard Sproat, Theeraphol Wattanavekin, Jaka Aris Eko Wibawa
This paper presents an overview of a program designed to address the growing need for developing freely available speech resources for under-represented languages.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 12 Oct 2020 • Alexander Gutkin, Martin Jansche, Lucy Skidmore
This extended abstract surveying the work on phonological typology was prepared for "SIGTYP 2020: The Second Workshop on Computational Research in Linguistic Typology" to be held at EMNLP 2020.
no code implementations • LREC 2020 • Yin May Oo, Theeraphol Wattanavekin, Chenfang Li, Pasindu De Silva, Supheakmungkol Sarin, Knot Pipatsrisawat, Martin Jansche, Oddur Kjartansson, Alex Gutkin, er
This paper introduces an open-source crowd-sourced multi-speaker speech corpus along with the comprehensive set of finite-state transducer (FST) grammars for performing text normalization for the Burmese (Myanmar) language.
no code implementations • LREC 2020 • Fei He, Shan-Hui Cathy Chu, Oddur Kjartansson, Clara Rivera, Anna Katanova, Alex Gutkin, er, Isin Demirsahin, Cibu Johny, Martin Jansche, Supheakmungkol Sarin, Knot Pipatsrisawat
We present free high quality multi-speaker speech corpora for Gujarati, Kannada, Malayalam, Marathi, Tamil and Telugu, which are six of the twenty two official languages of India spoken by 374 million native speakers.
no code implementations • 30 Apr 2020 • Alexander Gutkin, Tatiana Merkulova, Martin Jansche
In this paper we investigate whether the various linguistic features from World Atlas of Language Structures (WALS) can be reliably inferred from multi-lingual text.
2 code implementations • 21 May 2019 • Martin Jansche, Alexander Gutkin
We consider the problem of efficient sampling: drawing random string variates from the probability distribution represented by stochastic automata and transformations of those.
no code implementations • LREC 2016 • Alex Gutkin, er, Linne Ha, Martin Jansche, Knot Pipatsrisawat, Richard Sproat
We present a text-to-speech (TTS) system designed for the dialect of Bengali spoken in Bangladesh.
no code implementations • LREC 2014 • Martin Jansche
We propose a model-driven method for ensuring the quality of pronunciation dictionaries.