Search Results for author: Nay San

Found 7 papers, 2 papers with code

Leveraging pre-trained representations to improve access to untranscribed speech from endangered languages

1 code implementation • 26 Mar 2021 • Nay San, Martijn Bartelds, Mitchell Browne, Lily Clifford, Fiona Gibson, John Mansfield, David Nash, Jane Simpson, Myfany Turpin, Maria Vollmer, Sasha Wilmoth, Dan Jurafsky

Surprisingly, the English model outperformed the multilingual model on 4 Australian language datasets, raising questions around how to optimally leverage self-supervised speech representations for QbE-STD.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Code

Making More of Little Data: Improving Low-Resource Automatic Speech Recognition Using Data Augmentation

1 code implementation • 18 May 2023 • Martijn Bartelds, Nay San, Bradley McDonnell, Dan Jurafsky, Martijn Wieling

For Gronings, for which there was a pre-existing text-to-speech (TTS) system available, we also examined the use of TTS to generate ASR training data from text-only sources.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Code

Future Directions in Technological Support for Language Documentation

no code implementations • WS 2019 • Daan van Esch, Ben Foley, Nay San

Paper
Add Code

Automated speech tools for helping communities process restricted-access corpora for language revival efforts

no code implementations • ComputEL (ACL) 2022 • Nay San, Martijn Bartelds, Tolúlopé Ògúnrèmí, Alison Mount, Ruben Thompson, Michael Higgins, Roy Barker, Jane Simpson, Dan Jurafsky

An even narrower bottleneck occurs for recordings with access constraints, such as language that must be vetted or filtered by authorised community members before annotation can begin.

Action Detection Activity Detection +6

Paper
Add Code

Leveraging supplementary text data to kick-start automatic speech recognition system development with limited transcriptions

no code implementations • 9 Feb 2023 • Nay San, Martijn Bartelds, Blaine Billings, Ella de Falco, Hendi Feriza, Johan Safri, Wawan Sahrozi, Ben Foley, Bradley McDonnell, Dan Jurafsky

We perform experiments using 10 minutes of transcribed speech from English (for replicating prior work) and two additional pairs of languages differing in the availability of supplemental text data: Gronings and Frisian (~7. 5M token corpora available), and Besemah and Nasal (only small lexica available).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Developing Speech Processing Pipelines for Police Accountability

no code implementations • 9 Jun 2023 • Anjalie Field, Prateek Verma, Nay San, Jennifer L. Eberhardt, Dan Jurafsky

We investigate the potential of large pre-trained speech models for facilitating reviews, focusing on ASR and officer speech detection in footage from traffic stops.

Paper
Add Code

Predicting positive transfer for improved low-resource speech recognition using acoustic pseudo-tokens

no code implementations • 3 Feb 2024 • Nay San, Georgios Paraskevopoulos, Aryaman Arora, Xiluo He, Prabhjot Kaur, Oliver Adams, Dan Jurafsky

Continued pre-training on 70-200 hours of untranscribed speech in these languages can help -- but what about languages without that much recorded data?

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.