no code implementations • 29 May 2018 • George Sterpu, Christian Saam, Naomi Harte
Finding visual features and suitable models for lipreading tasks that are more complex than a well-constrained vocabulary has proven challenging.
1 code implementation • 29 Jun 2018 • Matthew Roddy, Gabriel Skantze, Naomi Harte
The continuous predictions represent generalized turn-taking behaviors observed in the training data and can be applied to make decisions that are not just limited to end-of-turn detection.
1 code implementation • 31 Aug 2018 • Matthew Roddy, Gabriel Skantze, Naomi Harte
To design spoken dialog systems that can conduct fluid interactions it is desirable to incorporate cues from separate modalities into turn-taking models.
3 code implementations • 5 Sep 2018 • George Sterpu, Christian Saam, Naomi Harte
Automatic speech recognition can potentially benefit from the lip motion patterns, complementing acoustic speech to improve the overall recognition performance, particularly in noise.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
1 code implementation • 17 Apr 2020 • George Sterpu, Christian Saam, Naomi Harte
A recently proposed multimodal fusion strategy, AV Align, based on state-of-the-art sequence to sequence neural networks, attempts to model this relationship by explicitly aligning the acoustic and visual representations of speech.
1 code implementation • ACL 2020 • Matthew Roddy, Naomi Harte
The timings of spoken response offsets in human dialogue have been shown to vary based on contextual elements of the dialogue.
1 code implementation • 19 May 2020 • George Sterpu, Christian Saam, Naomi Harte
The audio-visual speech fusion strategy AV Align has shown significant performance improvements in audio-visual speech recognition (AVSR) on the challenging LRS2 dataset.
1 code implementation • 8 Jun 2020 • George Sterpu, Christian Saam, Naomi Harte
Sequence to Sequence models, in particular the Transformer, achieve state of the art results in Automatic Speech Recognition.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 24 Sep 2020 • Ali Karaali, Naomi Harte, Claudio Rosito Jung
This paper presents an edge-based defocus blur estimation method from a single defocused image.
1 code implementation • 14 Dec 2020 • George Sterpu, Naomi Harte
In recent years, Automatic Speech Recognition (ASR) technology has approached human-level performance on conversational speech under relatively clean listening conditions.
no code implementations • 16 Dec 2021 • Mark Anderson, John Kennedy, Naomi Harte
This paper explores low resource classifiers and features for the detection of bird activity, suitable for embedded Automatic Recording Units which are typically deployed for long term remote monitoring of bird populations.
no code implementations • 16 Dec 2021 • Mark Anderson, Naomi Harte
This report presents deep learning and data augmentation techniques used by a system entered into the Few-Shot Bioacoustic Event Detection for the DCASE2021 Challenge.
no code implementations • 3 Oct 2022 • Mark Anderson, Naomi Harte
Combining this data with species agnostic bird activity detection systems enables the monitoring of activity levels of bird populations.
no code implementations • 20 Feb 2023 • Mark Anderson, Tomi Kinnunen, Naomi Harte
We show that although performance is overall improved, the filterbanks exhibit strong sensitivity to their initialisation strategy.
no code implementations • LREC 2022 • Justine Reverdy, Sam O’Connor Russell, Louise Duquenne, Diego Garaialde, Benjamin R. Cowan, Naomi Harte
The corpus was developed within the wider RoomReader Project to explore multimodal cues of conversational engagement and behavioural aspects of collaborative interaction in online environments.