no code implementations • LREC 2022 • Justine Reverdy, Sam O’Connor Russell, Louise Duquenne, Diego Garaialde, Benjamin R. Cowan, Naomi Harte
The corpus was developed within the wider RoomReader Project to explore multimodal cues of conversational engagement and behavioural aspects of collaborative interaction in online environments.
no code implementations • 31 Jan 2025 • Edward Storey, Naomi Harte, Peter Bell
Self-supervised learning (SSL) is used in deep learning to train on large datasets without the need for expensive labelling of the data.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 22 Dec 2024 • Zhaofeng Lin, Naomi Harte
We first quantify the visual contribution using effective SNR gains at 0 dB and then investigate the use of visual information in terms of its temporal distribution and word-level informativeness.
no code implementations • 5 Nov 2024 • Iván López-Espejo, Eros Roselló, Amin Edraki, Naomi Harte, Jesper Jensen
Advancing the design of robust hearing aid (HA) voice control is crucial to increase the HA use rate among hard of hearing people as well as to improve HA users' experience.
no code implementations • 20 Feb 2023 • Mark Anderson, Tomi Kinnunen, Naomi Harte
We show that although performance is overall improved, the filterbanks exhibit strong sensitivity to their initialisation strategy.
no code implementations • 3 Oct 2022 • Mark Anderson, Naomi Harte
Combining this data with species agnostic bird activity detection systems enables the monitoring of activity levels of bird populations.
no code implementations • 16 Dec 2021 • Mark Anderson, Naomi Harte
This report presents deep learning and data augmentation techniques used by a system entered into the Few-Shot Bioacoustic Event Detection for the DCASE2021 Challenge.
no code implementations • 16 Dec 2021 • Mark Anderson, John Kennedy, Naomi Harte
This paper explores low resource classifiers and features for the detection of bird activity, suitable for embedded Automatic Recording Units which are typically deployed for long term remote monitoring of bird populations.
1 code implementation • 14 Dec 2020 • George Sterpu, Naomi Harte
In recent years, Automatic Speech Recognition (ASR) technology has approached human-level performance on conversational speech under relatively clean listening conditions.
no code implementations • 24 Sep 2020 • Ali Karaali, Naomi Harte, Claudio Rosito Jung
This paper presents an edge-based defocus blur estimation method from a single defocused image.
1 code implementation • 8 Jun 2020 • George Sterpu, Christian Saam, Naomi Harte
Sequence to Sequence models, in particular the Transformer, achieve state of the art results in Automatic Speech Recognition.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
1 code implementation • 19 May 2020 • George Sterpu, Christian Saam, Naomi Harte
The audio-visual speech fusion strategy AV Align has shown significant performance improvements in audio-visual speech recognition (AVSR) on the challenging LRS2 dataset.
1 code implementation • ACL 2020 • Matthew Roddy, Naomi Harte
The timings of spoken response offsets in human dialogue have been shown to vary based on contextual elements of the dialogue.
1 code implementation • 17 Apr 2020 • George Sterpu, Christian Saam, Naomi Harte
A recently proposed multimodal fusion strategy, AV Align, based on state-of-the-art sequence to sequence neural networks, attempts to model this relationship by explicitly aligning the acoustic and visual representations of speech.
3 code implementations • 5 Sep 2018 • George Sterpu, Christian Saam, Naomi Harte
Automatic speech recognition can potentially benefit from the lip motion patterns, complementing acoustic speech to improve the overall recognition performance, particularly in noise.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
1 code implementation • 31 Aug 2018 • Matthew Roddy, Gabriel Skantze, Naomi Harte
To design spoken dialog systems that can conduct fluid interactions it is desirable to incorporate cues from separate modalities into turn-taking models.
1 code implementation • 29 Jun 2018 • Matthew Roddy, Gabriel Skantze, Naomi Harte
The continuous predictions represent generalized turn-taking behaviors observed in the training data and can be applied to make decisions that are not just limited to end-of-turn detection.
no code implementations • 29 May 2018 • George Sterpu, Christian Saam, Naomi Harte
Finding visual features and suitable models for lipreading tasks that are more complex than a well-constrained vocabulary has proven challenging.