Search Results for author: Patrick Littell

Found 32 papers, 6 papers with code

Translation Memories as Baselines for Low-Resource Machine Translation

no code implementations LREC 2022 Rebecca Knowles, Patrick Littell

Low-resource machine translation research often requires building baselines to benchmark estimates of progress in translation quality.

Machine Translation Translation

NRC Systems for the 2020 Inuktitut-English News Translation Task

no code implementations WMT (EMNLP) 2020 Rebecca Knowles, Darlene Stewart, Samuel Larkin, Patrick Littell

We describe the National Research Council of Canada (NRC) submissions for the 2020 Inuktitut-English shared task on news translation at the Fifth Conference on Machine Translation (WMT20).

Machine Translation Translation

ReadAlong Studio: Practical Zero-Shot Text-Speech Alignment for Indigenous Language Audiobooks

2 code implementations SIGUL (LREC) 2022 Patrick Littell, Eric Joanis, Aidan Pine, Marc Tessier, David Huggins Daines, Delasie Torkornoo

While the alignment of audio recordings and text (often termed “forced alignment”) is often treated as a solved problem, in practice the process of adapting an alignment system to a new, under-resourced language comes with significant challenges, requiring experience and expertise that many outside of the speech community lack.

NRC Systems for Low Resource German-Upper Sorbian Machine Translation 2020: Transfer Learning with Lexical Modifications

no code implementations WMT (EMNLP) 2020 Rebecca Knowles, Samuel Larkin, Darlene Stewart, Patrick Littell

We describe the National Research Council of Canada (NRC) neural machine translation systems for the German-Upper Sorbian supervised track of the 2020 shared task on Unsupervised MT and Very Low Resource Supervised MT.

Machine Translation Transfer Learning +1

The Nunavut Hansard Inuktitut--English Parallel Corpus 3.0 with Preliminary Machine Translation Results

no code implementations LREC 2020 Eric Joanis, Rebecca Knowles, Rol Kuhn, , Samuel Larkin, Patrick Littell, Chi-kiu Lo, Darlene Stewart, Jeffrey Micher

This paper describes a newly released sentence-aligned Inuktitut{--}English corpus based on the proceedings of the Legislative Assembly of Nunavut, covering sessions from April 1999 to June 2017.

Machine Translation NMT +2

AlloVera: A Multilingual Allophone Database

no code implementations LREC 2020 David R. Mortensen, Xinjian Li, Patrick Littell, Alexis Michaud, Shruti Rijhwani, Antonios Anastasopoulos, Alan W. black, Florian Metze, Graham Neubig

While phonemic representations are language specific, phonetic representations (stated in terms of (allo)phones) are much closer to a universal (language-independent) transcription.

speech-recognition Speech Recognition

Multi-Source Transformer for Kazakh-Russian-English Neural Machine Translation

no code implementations WS 2019 Patrick Littell, Chi-kiu Lo, Samuel Larkin, Darlene Stewart

We describe the neural machine translation (NMT) system developed at the National Research Council of Canada (NRC) for the Kazakh-English news translation task of the Fourth Conference on Machine Translation (WMT19).

Machine Translation NMT +2

Choosing Transfer Languages for Cross-Lingual Learning

1 code implementation ACL 2019 Yu-Hsiang Lin, Chian-Yu Chen, Jean Lee, Zirui Li, Yuyan Zhang, Mengzhou Xia, Shruti Rijhwani, Junxian He, Zhisong Zhang, Xuezhe Ma, Antonios Anastasopoulos, Patrick Littell, Graham Neubig

Cross-lingual transfer, where a high-resource transfer language is used to improve the accuracy of a low-resource task language, is now an invaluable tool for improving performance of natural language processing (NLP) on low-resource languages.

Cross-Lingual Transfer

Towards a General-Purpose Linguistic Annotation Backend

no code implementations13 Dec 2018 Graham Neubig, Patrick Littell, Chian-Yu Chen, Jean Lee, Zirui Li, Yu-Hsiang Lin, Yuyan Zhang

In this extended abstract, we describe the beginnings of a new project that will attempt to ease this language documentation process through the use of natural language processing (NLP) technology.

Management

Measuring sentence parallelism using Mahalanobis distances: The NRC unsupervised submissions to the WMT18 Parallel Corpus Filtering shared task

no code implementations WS 2018 Patrick Littell, Samuel Larkin, Darlene Stewart, Michel Simard, Cyril Goutte, Chi-kiu Lo

The WMT18 shared task on parallel corpus filtering (Koehn et al., 2018b) challenged teams to score sentence pairs from a large high-recall, low-precision web-scraped parallel corpus (Koehn et al., 2018a).

Anomaly Detection Machine Translation +1

Finite-state morphology for Kwak'wala: A phonological approach

no code implementations COLING 2018 Patrick Littell

This paper presents the phonological layer of a Kwak{'}wala finite-state morphological transducer, using the phonological hypotheses of Lincoln and Rath (1986) and the lenient composition operation of Karttunen (1998) to mediate the complicated relationship between underlying and surface forms.

Indigenous language technologies in Canada: Assessment, challenges, and successes

no code implementations COLING 2018 Patrick Littell, Anna Kazantseva, Rol Kuhn, , Aidan Pine, Antti Arppe, Christopher Cox, Marie-Odile Junker

In this article, we discuss which text, speech, and image technologies have been developed, and would be feasible to develop, for the approximately 60 Indigenous languages spoken in Canada.

Optical Character Recognition Optical Character Recognition (OCR) +7

URIEL and lang2vec: Representing languages as typological, geographical, and phylogenetic vectors

no code implementations EACL 2017 Patrick Littell, David R. Mortensen, Ke Lin, Katherine Kairis, Carlisle Turner, Lori Levin

We introduce the URIEL knowledge base for massively multilingual NLP and the lang2vec utility, which provides information-rich vector identifications of languages drawn from typological, geographical, and phylogenetic databases and normalized to have straightforward and consistent formats, naming, and semantics.

Language Identification Language Modelling +1

PanPhon: A Resource for Mapping IPA Segments to Articulatory Feature Vectors

1 code implementation COLING 2016 David R. Mortensen, Patrick Littell, Akash Bharadwaj, Kartik Goyal, Chris Dyer, Lori Levin

This paper contributes to a growing body of evidence that{---}when coupled with appropriate machine-learning techniques{--}linguistically motivated, information-rich representations can outperform one-hot encodings of linguistic data.

NER

The Role of Context in Neural Morphological Disambiguation

no code implementations COLING 2016 Qinlan Shen, Daniel Clothiaux, Emily Tagtow, Patrick Littell, Chris Dyer

While morphological analyzers can reduce this sparsity by providing morpheme-level analyses for words, they will often introduce ambiguity by returning multiple analyses for the same surface form.

Morphological Disambiguation

Named Entity Recognition for Linguistic Rapid Response in Low-Resource Languages: Sorani Kurdish and Tajik

no code implementations COLING 2016 Patrick Littell, Kartik Goyal, David R. Mortensen, Alexa Little, Chris Dyer, Lori Levin

This paper describes our construction of named-entity recognition (NER) systems in two Western Iranian languages, Sorani Kurdish and Tajik, as a part of a pilot study of {``}Linguistic Rapid Response{''} to potential emergency humanitarian relief situations.

Humanitarian named-entity-recognition +2

Polyglot Neural Language Models: A Case Study in Cross-Lingual Phonetic Representation Learning

no code implementations NAACL 2016 Yulia Tsvetkov, Sunayana Sitaram, Manaal Faruqui, Guillaume Lample, Patrick Littell, David Mortensen, Alan W. black, Lori Levin, Chris Dyer

We introduce polyglot language models, recurrent neural network models trained to predict symbol sequences in many different languages using shared representations of symbols and conditioning on typological information about the language to be predicted.

Representation Learning

Bridge-Language Capitalization Inference in Western Iranian: Sorani, Kurmanji, Zazaki, and Tajik

no code implementations LREC 2016 Patrick Littell, David R. Mortensen, Kartik Goyal, Chris Dyer, Lori Levin

In Sorani Kurdish, one of the most useful orthographic features in named-entity recognition {--} capitalization {--} is absent, as the language{'}s Perso-Arabic script does not make a distinction between uppercase and lowercase letters.

named-entity-recognition Named Entity Recognition +1

Morphological parsing of Swahili using crowdsourced lexical resources

no code implementations LREC 2014 Patrick Littell, Kaitlyn Price, Lori Levin

We describe a morphological analyzer for the Swahili language, written in an extension of XFST/LEXC intended for the easy declaration of morphophonological patterns and importation of lexical resources.

Machine Translation

Cannot find the paper you are looking for? You can Submit a new open access paper.