Search Results for author: Patrick Littell

Found 32 papers, 6 papers with code

Translation Memories as Baselines for Low-Resource Machine Translation

no code implementations • LREC 2022 • Rebecca Knowles, Patrick Littell

Low-resource machine translation research often requires building baselines to benchmark estimates of progress in translation quality.

Machine Translation Translation

Paper
Add Code

NRC Systems for the 2020 Inuktitut-English News Translation Task

no code implementations • WMT (EMNLP) 2020 • Rebecca Knowles, Darlene Stewart, Samuel Larkin, Patrick Littell

We describe the National Research Council of Canada (NRC) submissions for the 2020 Inuktitut-English shared task on news translation at the Fifth Conference on Machine Translation (WMT20).

Machine Translation Translation

Paper
Add Code

ReadAlong Studio: Practical Zero-Shot Text-Speech Alignment for Indigenous Language Audiobooks

2 code implementations • SIGUL (LREC) 2022 • Patrick Littell, Eric Joanis, Aidan Pine, Marc Tessier, David Huggins Daines, Delasie Torkornoo

While the alignment of audio recordings and text (often termed “forced alignment”) is often treated as a solved problem, in practice the process of adapting an alignment system to a new, under-resourced language comes with significant challenges, requiring experience and expertise that many outside of the speech community lack.

Paper
Code

NRC-CNRC Machine Translation Systems for the 2021 AmericasNLP Shared Task

no code implementations • NAACL (AmericasNLP) 2021 • Rebecca Knowles, Darlene Stewart, Samuel Larkin, Patrick Littell

We describe the NRC-CNRC systems submitted to the AmericasNLP shared task on machine translation.

Machine Translation Translation

Paper
Add Code

NRC Systems for Low Resource German-Upper Sorbian Machine Translation 2020: Transfer Learning with Lexical Modifications

no code implementations • WMT (EMNLP) 2020 • Rebecca Knowles, Samuel Larkin, Darlene Stewart, Patrick Littell

We describe the National Research Council of Canada (NRC) neural machine translation systems for the German-Upper Sorbian supervised track of the 2020 shared task on Unsupervised MT and Very Low Resource Supervised MT.

Machine Translation Transfer Learning +1

Paper
Add Code

Requirements and Motivations of Low-Resource Speech Synthesis for Language Revitalization

1 code implementation • ACL 2022 • Aidan Pine, Dan Wells, Nathan Brinklow, Patrick Littell, Korin Richmond

This paper describes the motivation and development of speech synthesis systems for the purposes of language revitalization.

Speech Synthesis

Paper
Code

The Indigenous Languages Technology project at NRC Canada: An empowerment-oriented approach to developing language software

no code implementations • COLING 2020 • Roland Kuhn, Fineen Davis, Alain D{\'e}silets, Eric Joanis, Anna Kazantseva, Rebecca Knowles, Patrick Littell, Delaney Lothian, Aidan Pine, Caroline Running Wolf, Eddie Santos, Darlene Stewart, Gilles Boulianne, Vishwa Gupta, Brian Maracle Owennat{\'e}kha, Akwirat{\'e}kha{'} Martin, Christopher Cox, Marie-Odile Junker, Olivia Sammons, Delasie Torkornoo, Nathan Thanyeht{\'e}nhas Brinklow, Sara Child, Beno{\^\i}t Farley, David Huggins-Daines, Daisy Rosenblum, Heather Souter

This paper surveys the first, three-year phase of a project at the National Research Council of Canada that is developing software to assist Indigenous communities in Canada in preserving their languages and extending their use.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Neural Polysynthetic Language Modelling

no code implementations • 11 May 2020 • Lane Schwartz, Francis Tyers, Lori Levin, Christo Kirov, Patrick Littell, Chi-kiu Lo, Emily Prud'hommeaux, Hyunji Hayley Park, Kenneth Steimel, Rebecca Knowles, Jeffrey Micher, Lonny Strunk, Han Liu, Coleman Haley, Katherine J. Zhang, Robbie Jimmerson, Vasilisa Andriyanets, Aldrian Obaja Muis, Naoki Otani, Jong Hyuk Park, Zhisong Zhang

In the literature, languages like Finnish or Turkish are held up as extreme examples of complexity that challenge common modelling assumptions.

Language Modelling Lemmatization +1

Paper
Add Code

The Nunavut Hansard Inuktitut--English Parallel Corpus 3.0 with Preliminary Machine Translation Results

no code implementations • LREC 2020 • Eric Joanis, Rebecca Knowles, Rol Kuhn, , Samuel Larkin, Patrick Littell, Chi-kiu Lo, Darlene Stewart, Jeffrey Micher

This paper describes a newly released sentence-aligned Inuktitut{--}English corpus based on the proceedings of the Legislative Assembly of Nunavut, covering sessions from April 1999 to June 2017.

Machine Translation NMT +2

Paper
Add Code

A Summary of the First Workshop on Language Technology for Language Documentation and Revitalization

no code implementations • LREC 2020 • Graham Neubig, Shruti Rijhwani, Alexis Palmer, Jordan MacKenzie, Hilaria Cruz, Xinjian Li, Matthew Lee, Aditi Chaudhary, Luke Gessler, Steven Abney, Shirley Anugrah Hayati, Antonios Anastasopoulos, Olga Zamaraeva, Emily Prud'hommeaux, Jennette Child, Sara Child, Rebecca Knowles, Sarah Moeller, Jeffrey Micher, Yiyuan Li, Sydney Zink, Mengzhou Xia, Roshan S Sharma, Patrick Littell

Despite recent advances in natural language processing and other language technology, the application of such technology to language documentation and conservation has been limited.

Paper
Add Code

AlloVera: A Multilingual Allophone Database

no code implementations • LREC 2020 • David R. Mortensen, Xinjian Li, Patrick Littell, Alexis Michaud, Shruti Rijhwani, Antonios Anastasopoulos, Alan W. black, Florian Metze, Graham Neubig

While phonemic representations are language specific, phonetic representations (stated in terms of (allo)phones) are much closer to a universal (language-independent) transcription.

speech-recognition Speech Recognition

Paper
Add Code

Universal Phone Recognition with a Multilingual Allophone System

1 code implementation • 26 Feb 2020 • Xinjian Li, Siddharth Dalmia, Juncheng Li, Matthew Lee, Patrick Littell, Jiali Yao, Antonios Anastasopoulos, David R. Mortensen, Graham Neubig, Alan W. black, Florian Metze

Multilingual models can improve language processing, particularly for low resource situations, by sharing parameters across languages.

speech-recognition Speech Recognition

505

Paper
Code

Multi-Source Transformer for Kazakh-Russian-English Neural Machine Translation

no code implementations • WS 2019 • Patrick Littell, Chi-kiu Lo, Samuel Larkin, Darlene Stewart

We describe the neural machine translation (NMT) system developed at the National Research Council of Canada (NRC) for the Kazakh-English news translation task of the Fourth Conference on Machine Translation (WMT19).

Machine Translation NMT +2

Paper
Add Code

Choosing Transfer Languages for Cross-Lingual Learning

1 code implementation • ACL 2019 • Yu-Hsiang Lin, Chian-Yu Chen, Jean Lee, Zirui Li, Yuyan Zhang, Mengzhou Xia, Shruti Rijhwani, Junxian He, Zhisong Zhang, Xuezhe Ma, Antonios Anastasopoulos, Patrick Littell, Graham Neubig

Cross-lingual transfer, where a high-resource transfer language is used to improve the accuracy of a low-resource task language, is now an invaluable tool for improving performance of natural language processing (NLP) on low-resource languages.

Cross-Lingual Transfer

Paper
Code

Towards a General-Purpose Linguistic Annotation Backend

no code implementations • 13 Dec 2018 • Graham Neubig, Patrick Littell, Chian-Yu Chen, Jean Lee, Zirui Li, Yu-Hsiang Lin, Yuyan Zhang

In this extended abstract, we describe the beginnings of a new project that will attempt to ease this language documentation process through the use of natural language processing (NLP) technology.

Management

Paper
Add Code

Measuring sentence parallelism using Mahalanobis distances: The NRC unsupervised submissions to the WMT18 Parallel Corpus Filtering shared task

no code implementations • WS 2018 • Patrick Littell, Samuel Larkin, Darlene Stewart, Michel Simard, Cyril Goutte, Chi-kiu Lo

The WMT18 shared task on parallel corpus filtering (Koehn et al., 2018b) challenged teams to score sentence pairs from a large high-recall, low-precision web-scraped parallel corpus (Koehn et al., 2018a).

Anomaly Detection Machine Translation +1

Paper
Add Code

Accurate semantic textual similarity for cleaning noisy parallel corpora using semantic machine translation evaluation metric: The NRC supervised submissions to the Parallel Corpus Filtering task

no code implementations • WS 2018 • Chi-kiu Lo, Michel Simard, Darlene Stewart, Samuel Larkin, Cyril Goutte, Patrick Littell

We present our semantic textual similarity approach in filtering a noisy web crawled parallel corpus using YiSi{---}a novel semantic machine translation evaluation metric.

Machine Translation Semantic Textual Similarity +1

Paper
Add Code

Finite-state morphology for Kwak'wala: A phonological approach

no code implementations • COLING 2018 • Patrick Littell

This paper presents the phonological layer of a Kwak{'}wala finite-state morphological transducer, using the phonological hypotheses of Lincoln and Rath (1986) and the lenient composition operation of Karttunen (1998) to mediate the complicated relationship between underlying and surface forms.

Paper
Add Code

Indigenous language technologies in Canada: Assessment, challenges, and successes

no code implementations • COLING 2018 • Patrick Littell, Anna Kazantseva, Rol Kuhn, , Aidan Pine, Antti Arppe, Christopher Cox, Marie-Odile Junker

In this article, we discuss which text, speech, and image technologies have been developed, and would be feasible to develop, for the approximately 60 Indigenous languages spoken in Canada.

Optical Character Recognition Optical Character Recognition (OCR) +7

Paper
Add Code

Parser combinators for Tigrinya and Oromo morphology

no code implementations • LREC 2018 • Patrick Littell, Tom McCoy, Na-Rae Han, Shruti Rijhwani, Zaid Sheikh, David Mortensen, Teruko Mitamura, Lori Levin

Lemmatization Machine Translation

Paper
Add Code

Epitran: Precision G2P for Many Languages

no code implementations • LREC 2018 • David R. Mortensen, Siddharth Dalmia, Patrick Littell

Entity Linking

Paper
Add Code

Learning Language Representations for Typology Prediction

2 code implementations • EMNLP 2017 • Chaitanya Malaviya, Graham Neubig, Patrick Littell

One central mystery of neural NLP is what neural models "know" about their subject matter.

Machine Translation NMT +1

Paper
Code

URIEL and lang2vec: Representing languages as typological, geographical, and phylogenetic vectors

no code implementations • EACL 2017 • Patrick Littell, David R. Mortensen, Ke Lin, Katherine Kairis, Carlisle Turner, Lori Levin

We introduce the URIEL knowledge base for massively multilingual NLP and the lang2vec utility, which provides information-rich vector identifications of languages drawn from typological, geographical, and phylogenetic databases and normalized to have straightforward and consistent formats, naming, and semantics.

Language Identification Language Modelling +1

Paper
Add Code

Waldayu and Waldayu Mobile: Modern digital dictionary interfaces for endangered languages

no code implementations • WS 2017 • Patrick Littell, Aidan Pine, Henry Davis

Paper
Add Code

STREAMLInED Challenges: Aligning Research Interests with Shared Tasks

no code implementations • WS 2017 • Gina-Anne Levow, Emily M. Bender, Patrick Littell, Kristen Howell, Shobhana Chelliah, Joshua Crowgey, Dan Garrette, Jeff Good, Sharon Hargus, David Inman, Michael Maxwell, Michael Tjalve, Fei Xia

Paper
Add Code

PanPhon: A Resource for Mapping IPA Segments to Articulatory Feature Vectors

1 code implementation • COLING 2016 • David R. Mortensen, Patrick Littell, Akash Bharadwaj, Kartik Goyal, Chris Dyer, Lori Levin

This paper contributes to a growing body of evidence that{---}when coupled with appropriate machine-learning techniques{--}linguistically motivated, information-rich representations can outperform one-hot encodings of linguistic data.

NER

193

Paper
Code

The Role of Context in Neural Morphological Disambiguation

no code implementations • COLING 2016 • Qinlan Shen, Daniel Clothiaux, Emily Tagtow, Patrick Littell, Chris Dyer

While morphological analyzers can reduce this sparsity by providing morpheme-level analyses for words, they will often introduce ambiguity by returning multiple analyses for the same surface form.

Morphological Disambiguation

Paper
Add Code

Named Entity Recognition for Linguistic Rapid Response in Low-Resource Languages: Sorani Kurdish and Tajik

no code implementations • COLING 2016 • Patrick Littell, Kartik Goyal, David R. Mortensen, Alexa Little, Chris Dyer, Lori Levin

This paper describes our construction of named-entity recognition (NER) systems in two Western Iranian languages, Sorani Kurdish and Tajik, as a part of a pilot study of {``}Linguistic Rapid Response{''} to potential emergency humanitarian relief situations.

Humanitarian named-entity-recognition +2

Paper
Add Code

Polyglot Neural Language Models: A Case Study in Cross-Lingual Phonetic Representation Learning

no code implementations • NAACL 2016 • Yulia Tsvetkov, Sunayana Sitaram, Manaal Faruqui, Guillaume Lample, Patrick Littell, David Mortensen, Alan W. black, Lori Levin, Chris Dyer

We introduce polyglot language models, recurrent neural network models trained to predict symbol sequences in many different languages using shared representations of symbols and conditioning on typological information about the language to be predicted.

Representation Learning

Paper
Add Code

Bridge-Language Capitalization Inference in Western Iranian: Sorani, Kurmanji, Zazaki, and Tajik

no code implementations • LREC 2016 • Patrick Littell, David R. Mortensen, Kartik Goyal, Chris Dyer, Lori Levin

In Sorani Kurdish, one of the most useful orthographic features in named-entity recognition {--} capitalization {--} is absent, as the language{'}s Perso-Arabic script does not make a distinction between uppercase and lowercase letters.

named-entity-recognition Named Entity Recognition +1

Paper
Add Code

Morphological parsing of Swahili using crowdsourced lexical resources

no code implementations • LREC 2014 • Patrick Littell, Kaitlyn Price, Lori Levin

We describe a morphological analyzer for the Swahili language, written in an extension of XFST/LEXC intended for the easy declaration of morphophonological patterns and importation of lexical resources.

Machine Translation

Paper
Add Code

Introducing Computational Concepts in a Linguistics Olympiad

no code implementations • WS 2013 • Patrick Littell, Lori Levin, Jason Eisner, Dragomir Radev

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.