no code implementations • NAACL (AmericasNLP) 2021 • Rebecca Knowles, Darlene Stewart, Samuel Larkin, Patrick Littell
We describe the NRC-CNRC systems submitted to the AmericasNLP shared task on machine translation.
no code implementations • WMT (EMNLP) 2020 • Rebecca Knowles, Samuel Larkin, Darlene Stewart, Patrick Littell
We describe the National Research Council of Canada (NRC) neural machine translation systems for the German-Upper Sorbian supervised track of the 2020 shared task on Unsupervised MT and Very Low Resource Supervised MT.
no code implementations • LREC 2022 • Rebecca Knowles, Patrick Littell
Low-resource machine translation research often requires building baselines to benchmark estimates of progress in translation quality.
2 code implementations • SIGUL (LREC) 2022 • Patrick Littell, Eric Joanis, Aidan Pine, Marc Tessier, David Huggins Daines, Delasie Torkornoo
While the alignment of audio recordings and text (often termed “forced alignment”) is often treated as a solved problem, in practice the process of adapting an alignment system to a new, under-resourced language comes with significant challenges, requiring experience and expertise that many outside of the speech community lack.
1 code implementation • ACL 2022 • Aidan Pine, Dan Wells, Nathan Brinklow, Patrick Littell, Korin Richmond
This paper describes the motivation and development of speech synthesis systems for the purposes of language revitalization.
no code implementations • WMT (EMNLP) 2020 • Rebecca Knowles, Darlene Stewart, Samuel Larkin, Patrick Littell
We describe the National Research Council of Canada (NRC) submissions for the 2020 Inuktitut-English shared task on news translation at the Fifth Conference on Machine Translation (WMT20).
no code implementations • COLING 2020 • Roland Kuhn, Fineen Davis, Alain D{\'e}silets, Eric Joanis, Anna Kazantseva, Rebecca Knowles, Patrick Littell, Delaney Lothian, Aidan Pine, Caroline Running Wolf, Eddie Santos, Darlene Stewart, Gilles Boulianne, Vishwa Gupta, Brian Maracle Owennat{\'e}kha, Akwirat{\'e}kha{'} Martin, Christopher Cox, Marie-Odile Junker, Olivia Sammons, Delasie Torkornoo, Nathan Thanyeht{\'e}nhas Brinklow, Sara Child, Beno{\^\i}t Farley, David Huggins-Daines, Daisy Rosenblum, Heather Souter
This paper surveys the first, three-year phase of a project at the National Research Council of Canada that is developing software to assist Indigenous communities in Canada in preserving their languages and extending their use.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
no code implementations • 11 May 2020 • Lane Schwartz, Francis Tyers, Lori Levin, Christo Kirov, Patrick Littell, Chi-kiu Lo, Emily Prud'hommeaux, Hyunji Hayley Park, Kenneth Steimel, Rebecca Knowles, Jeffrey Micher, Lonny Strunk, Han Liu, Coleman Haley, Katherine J. Zhang, Robbie Jimmerson, Vasilisa Andriyanets, Aldrian Obaja Muis, Naoki Otani, Jong Hyuk Park, Zhisong Zhang
In the literature, languages like Finnish or Turkish are held up as extreme examples of complexity that challenge common modelling assumptions.
no code implementations • LREC 2020 • Eric Joanis, Rebecca Knowles, Rol Kuhn, , Samuel Larkin, Patrick Littell, Chi-kiu Lo, Darlene Stewart, Jeffrey Micher
This paper describes a newly released sentence-aligned Inuktitut{--}English corpus based on the proceedings of the Legislative Assembly of Nunavut, covering sessions from April 1999 to June 2017.
no code implementations • LREC 2020 • Graham Neubig, Shruti Rijhwani, Alexis Palmer, Jordan MacKenzie, Hilaria Cruz, Xinjian Li, Matthew Lee, Aditi Chaudhary, Luke Gessler, Steven Abney, Shirley Anugrah Hayati, Antonios Anastasopoulos, Olga Zamaraeva, Emily Prud'hommeaux, Jennette Child, Sara Child, Rebecca Knowles, Sarah Moeller, Jeffrey Micher, Yiyuan Li, Sydney Zink, Mengzhou Xia, Roshan S Sharma, Patrick Littell
Despite recent advances in natural language processing and other language technology, the application of such technology to language documentation and conservation has been limited.
no code implementations • LREC 2020 • David R. Mortensen, Xinjian Li, Patrick Littell, Alexis Michaud, Shruti Rijhwani, Antonios Anastasopoulos, Alan W. black, Florian Metze, Graham Neubig
While phonemic representations are language specific, phonetic representations (stated in terms of (allo)phones) are much closer to a universal (language-independent) transcription.
1 code implementation • 26 Feb 2020 • Xinjian Li, Siddharth Dalmia, Juncheng Li, Matthew Lee, Patrick Littell, Jiali Yao, Antonios Anastasopoulos, David R. Mortensen, Graham Neubig, Alan W. black, Florian Metze
Multilingual models can improve language processing, particularly for low resource situations, by sharing parameters across languages.
no code implementations • WS 2019 • Patrick Littell, Chi-kiu Lo, Samuel Larkin, Darlene Stewart
We describe the neural machine translation (NMT) system developed at the National Research Council of Canada (NRC) for the Kazakh-English news translation task of the Fourth Conference on Machine Translation (WMT19).
1 code implementation • ACL 2019 • Yu-Hsiang Lin, Chian-Yu Chen, Jean Lee, Zirui Li, Yuyan Zhang, Mengzhou Xia, Shruti Rijhwani, Junxian He, Zhisong Zhang, Xuezhe Ma, Antonios Anastasopoulos, Patrick Littell, Graham Neubig
Cross-lingual transfer, where a high-resource transfer language is used to improve the accuracy of a low-resource task language, is now an invaluable tool for improving performance of natural language processing (NLP) on low-resource languages.
no code implementations • 13 Dec 2018 • Graham Neubig, Patrick Littell, Chian-Yu Chen, Jean Lee, Zirui Li, Yu-Hsiang Lin, Yuyan Zhang
In this extended abstract, we describe the beginnings of a new project that will attempt to ease this language documentation process through the use of natural language processing (NLP) technology.
no code implementations • WS 2018 • Patrick Littell, Samuel Larkin, Darlene Stewart, Michel Simard, Cyril Goutte, Chi-kiu Lo
The WMT18 shared task on parallel corpus filtering (Koehn et al., 2018b) challenged teams to score sentence pairs from a large high-recall, low-precision web-scraped parallel corpus (Koehn et al., 2018a).
no code implementations • WS 2018 • Chi-kiu Lo, Michel Simard, Darlene Stewart, Samuel Larkin, Cyril Goutte, Patrick Littell
We present our semantic textual similarity approach in filtering a noisy web crawled parallel corpus using YiSi{---}a novel semantic machine translation evaluation metric.
no code implementations • COLING 2018 • Patrick Littell
This paper presents the phonological layer of a Kwak{'}wala finite-state morphological transducer, using the phonological hypotheses of Lincoln and Rath (1986) and the lenient composition operation of Karttunen (1998) to mediate the complicated relationship between underlying and surface forms.
no code implementations • COLING 2018 • Patrick Littell, Anna Kazantseva, Rol Kuhn, , Aidan Pine, Antti Arppe, Christopher Cox, Marie-Odile Junker
In this article, we discuss which text, speech, and image technologies have been developed, and would be feasible to develop, for the approximately 60 Indigenous languages spoken in Canada.
Optical Character Recognition
Optical Character Recognition (OCR)
+7
2 code implementations • EMNLP 2017 • Chaitanya Malaviya, Graham Neubig, Patrick Littell
One central mystery of neural NLP is what neural models "know" about their subject matter.
no code implementations • EACL 2017 • Patrick Littell, David R. Mortensen, Ke Lin, Katherine Kairis, Carlisle Turner, Lori Levin
We introduce the URIEL knowledge base for massively multilingual NLP and the lang2vec utility, which provides information-rich vector identifications of languages drawn from typological, geographical, and phylogenetic databases and normalized to have straightforward and consistent formats, naming, and semantics.
no code implementations • WS 2017 • Gina-Anne Levow, Emily M. Bender, Patrick Littell, Kristen Howell, Shobhana Chelliah, Joshua Crowgey, Dan Garrette, Jeff Good, Sharon Hargus, David Inman, Michael Maxwell, Michael Tjalve, Fei Xia
1 code implementation • COLING 2016 • David R. Mortensen, Patrick Littell, Akash Bharadwaj, Kartik Goyal, Chris Dyer, Lori Levin
This paper contributes to a growing body of evidence that{---}when coupled with appropriate machine-learning techniques{--}linguistically motivated, information-rich representations can outperform one-hot encodings of linguistic data.
no code implementations • COLING 2016 • Qinlan Shen, Daniel Clothiaux, Emily Tagtow, Patrick Littell, Chris Dyer
While morphological analyzers can reduce this sparsity by providing morpheme-level analyses for words, they will often introduce ambiguity by returning multiple analyses for the same surface form.
no code implementations • COLING 2016 • Patrick Littell, Kartik Goyal, David R. Mortensen, Alexa Little, Chris Dyer, Lori Levin
This paper describes our construction of named-entity recognition (NER) systems in two Western Iranian languages, Sorani Kurdish and Tajik, as a part of a pilot study of {``}Linguistic Rapid Response{''} to potential emergency humanitarian relief situations.
no code implementations • NAACL 2016 • Yulia Tsvetkov, Sunayana Sitaram, Manaal Faruqui, Guillaume Lample, Patrick Littell, David Mortensen, Alan W. black, Lori Levin, Chris Dyer
We introduce polyglot language models, recurrent neural network models trained to predict symbol sequences in many different languages using shared representations of symbols and conditioning on typological information about the language to be predicted.
no code implementations • LREC 2016 • Patrick Littell, David R. Mortensen, Kartik Goyal, Chris Dyer, Lori Levin
In Sorani Kurdish, one of the most useful orthographic features in named-entity recognition {--} capitalization {--} is absent, as the language{'}s Perso-Arabic script does not make a distinction between uppercase and lowercase letters.
no code implementations • LREC 2014 • Patrick Littell, Kaitlyn Price, Lori Levin
We describe a morphological analyzer for the Swahili language, written in an extension of XFST/LEXC intended for the easy declaration of morphophonological patterns and importation of lexical resources.