no code implementations • EMNLP 2021 • William Lane, Steven Bird
Human expertise and the participation of speech communities are essential factors in the success of technologies for low-resource languages.
no code implementations • FieldMatters (COLING) 2022 • William Lane, Steven Bird
We describe a novel approach to transcribing morphologically complex, local, oral languages.
no code implementations • LREC (MWE) 2022 • Steven Bird
Research on multiword expressions and on under-resourced languages often begins with problematisation.
Cultural Vocal Bursts Intensity Prediction
speech-recognition
+1
no code implementations • CL (ACL) 2020 • Steven Bird
The transcription bottleneck is often cited as a major obstacle for efforts to document the world’s endangered languages and supply them with language technologies.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+4
no code implementations • COLING 2022 • Éric Le Ferrand, Steven Bird, Laurent Besacier
An increasing number of papers have been addressing issues related to low-resource languages and the transcription bottleneck paradigm.
no code implementations • ACL 2022 • Steven Bird
How can language technology address the diverse situations of the world’s languages?
no code implementations • ACL 2022 • Eric Le Ferrand, Steven Bird, Laurent Besacier
Most low resource language technology development is premised on the need to collect data for training statistical models.
no code implementations • ComputEL (ACL) 2022 • Mat Bettinson, Steven Bird
Transcribing speech for primarily oral, local languages is often a joint effort involving speakers and outsiders.
no code implementations • ALTA 2021 • Eric Le Ferrand, Steven Bird, Laurent Besacier
We investigate the efficiency of two very different spoken term detection approaches for transcription when the available data is insufficient to train a robust speech recognition system.
no code implementations • NAACL (DaSH) 2021 • William Lane, Mat Bettinson, Steven Bird
Transcribing low resource languages can be challenging in the absence of a good lexicon and trained transcribers.
no code implementations • 11 Jun 2021 • Éric Le Ferrand, Steven Bird, Laurent Besacier
We investigate the efficiency of two very different spoken term detection approaches for transcription when the available data is insufficient to train a robust ASR system.
no code implementations • COLING 2020 • William Lane, Steven Bird
We show that the space of proximal morph completions is many orders of magnitude smaller than the space of full word completions for Kunwinjku.
no code implementations • COLING 2020 • Steven Bird
After generations of exploitation, Indigenous people often respond negatively to the idea that their languages are data ready for the taking.
no code implementations • COLING 2020 • Éric Le Ferrand, Steven Bird, Laurent Besacier
We propose a novel transcription workflow which combines spoken term detection and human-in-the-loop, together with a pilot experiment.
no code implementations • ACL 2020 • William Lane, Steven Bird
To address this challenge, we offer linguistically-informed approaches for bootstrapping a neural morphological analyzer, and demonstrate its application to Kunwinjku, a polysynthetic Australian language.
no code implementations • ALTA 2019 • William Lane, Steven Bird
Kunwinjku is an indigenous Australian language spoken in northern Australia which exhibits agglutinative and polysynthetic properties.
no code implementations • EACL 2017 • Long Duong, Hiroshi Kanayama, Tengfei Ma, Steven Bird, Trevor Cohn
Crosslingual word embeddings represent lexical items from different languages using the same vector space, enabling crosslingual transfer.
no code implementations • EACL 2017 • Oliver Adams, Adam Makarucha, Graham Neubig, Steven Bird, Trevor Cohn
We investigate the use of such lexicons to improve language models when textual training data is limited to as few as a thousand sentences.
1 code implementation • EMNLP 2016 • Long Duong, Hiroshi Kanayama, Tengfei Ma, Steven Bird, Trevor Cohn
Crosslingual word embeddings represent lexical items from different languages in the same vector space, enabling transfer of NLP tools.
Bilingual Lexicon Induction
Cross-Lingual Document Classification
+4
1 code implementation • 17 May 2002 • Edward Loper, Steven Bird
NLTK, the Natural Language Toolkit, is a suite of open source program modules, tutorials and problem sets, providing ready-to-use computational linguistics courseware.
Ranked #4 on
Multi-Label Text Classification
on CC3M-TagMask
1 code implementation • 5 Jul 1999 • Steven Bird, Mark Liberman
In recent work we have presented a formal framework for linguistic annotation based on labeled acyclic digraphs.