Search Results for author: Naomi Harte

Found 15 papers, 8 papers with code

Can DNNs Learn to Lipread Full Sentences?

no code implementations • 29 May 2018 • George Sterpu, Christian Saam, Naomi Harte

Finding visual features and suitable models for lipreading tasks that are more complex than a well-constrained vocabulary has proven challenging.

General Classification Language Modelling +1

Paper
Add Code

Investigating Speech Features for Continuous Turn-Taking Prediction Using LSTMs

1 code implementation • 29 Jun 2018 • Matthew Roddy, Gabriel Skantze, Naomi Harte

The continuous predictions represent generalized turn-taking behaviors observed in the training data and can be applied to make decisions that are not just limited to end-of-turn detection.

Paper
Code

Multimodal Continuous Turn-Taking Prediction Using Multiscale RNNs

1 code implementation • 31 Aug 2018 • Matthew Roddy, Gabriel Skantze, Naomi Harte

To design spoken dialog systems that can conduct fluid interactions it is desirable to incorporate cues from separate modalities into turn-taking models.

Paper
Code

Attention-based Audio-Visual Fusion for Robust Automatic Speech Recognition

3 code implementations • 5 Sep 2018 • George Sterpu, Christian Saam, Naomi Harte

Automatic speech recognition can potentially benefit from the lip motion patterns, complementing acoustic speech to improve the overall recognition performance, particularly in noise.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Code

How to Teach DNNs to Pay Attention to the Visual Modality in Speech Recognition

1 code implementation • 17 Apr 2020 • George Sterpu, Christian Saam, Naomi Harte

A recently proposed multimodal fusion strategy, AV Align, based on state-of-the-art sequence to sequence neural networks, attempts to model this relationship by explicitly aligning the acoustic and visual representations of speech.

Audio-Visual Speech Recognition speech-recognition +1

Paper
Code

Neural Generation of Dialogue Response Timings

1 code implementation • ACL 2020 • Matthew Roddy, Naomi Harte

The timings of spoken response offsets in human dialogue have been shown to vary based on contextual elements of the dialogue.

Paper
Code

Should we hard-code the recurrence concept or learn it instead ? Exploring the Transformer architecture for Audio-Visual Speech Recognition

1 code implementation • 19 May 2020 • George Sterpu, Christian Saam, Naomi Harte

The audio-visual speech fusion strategy AV Align has shown significant performance improvements in audio-visual speech recognition (AVSR) on the challenging LRS2 dataset.

Audio-Visual Speech Recognition speech-recognition +1

Paper
Code

Learning to Count Words in Fluent Speech enables Online Speech Recognition

1 code implementation • 8 Jun 2020 • George Sterpu, Christian Saam, Naomi Harte

Sequence to Sequence models, in particular the Transformer, achieve state of the art results in Automatic Speech Recognition.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Code

Deep Multi-Scale Feature Learning for Defocus Blur Estimation

no code implementations • 24 Sep 2020 • Ali Karaali, Naomi Harte, Claudio Rosito Jung

This paper presents an edge-based defocus blur estimation method from a single defocused image.

Edge Classification

Paper
Add Code

AV Taris: Online Audio-Visual Speech Recognition

1 code implementation • 14 Dec 2020 • George Sterpu, Naomi Harte

In recent years, Automatic Speech Recognition (ASR) technology has approached human-level performance on conversational speech under relatively clean listening conditions.

Action Detection Activity Detection +5

Paper
Code

Low Resource Species Agnostic Bird Activity Detection

no code implementations • 16 Dec 2021 • Mark Anderson, John Kennedy, Naomi Harte

This paper explores low resource classifiers and features for the detection of bird activity, suitable for embedded Automatic Recording Units which are typically deployed for long term remote monitoring of bird populations.

Action Detection Activity Detection

Paper
Add Code

Bioacoustic Event Detection with prototypical networks and data augmentation

no code implementations • 16 Dec 2021 • Mark Anderson, Naomi Harte

This report presents deep learning and data augmentation techniques used by a system entered into the Few-Shot Bioacoustic Event Detection for the DCASE2021 Challenge.

Data Augmentation Event Detection +1

Paper
Add Code

Learnable Acoustic Frontends in Bird Activity Detection

no code implementations • 3 Oct 2022 • Mark Anderson, Naomi Harte

Combining this data with species agnostic bird activity detection systems enables the monitoring of activity levels of bird populations.

Action Detection Activity Detection +1

Paper
Add Code

Learnable Frontends that do not Learn: Quantifying Sensitivity to Filterbank Initialisation

no code implementations • 20 Feb 2023 • Mark Anderson, Tomi Kinnunen, Naomi Harte

We show that although performance is overall improved, the filterbanks exhibit strong sensitivity to their initialisation strategy.

Action Detection Activity Detection

Paper
Add Code

RoomReader: A Multimodal Corpus of Online Multiparty Conversational Interactions

no code implementations • LREC 2022 • Justine Reverdy, Sam O’Connor Russell, Louise Duquenne, Diego Garaialde, Benjamin R. Cowan, Naomi Harte

The corpus was developed within the wider RoomReader Project to explore multimodal cues of conversational engagement and behavioural aspects of collaborative interaction in online environments.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.