Search Results for author: Naomi Harte

Found 15 papers, 8 papers with code

Can DNNs Learn to Lipread Full Sentences?

no code implementations29 May 2018 George Sterpu, Christian Saam, Naomi Harte

Finding visual features and suitable models for lipreading tasks that are more complex than a well-constrained vocabulary has proven challenging.

General Classification Language Modelling +1

Investigating Speech Features for Continuous Turn-Taking Prediction Using LSTMs

1 code implementation29 Jun 2018 Matthew Roddy, Gabriel Skantze, Naomi Harte

The continuous predictions represent generalized turn-taking behaviors observed in the training data and can be applied to make decisions that are not just limited to end-of-turn detection.

Multimodal Continuous Turn-Taking Prediction Using Multiscale RNNs

1 code implementation31 Aug 2018 Matthew Roddy, Gabriel Skantze, Naomi Harte

To design spoken dialog systems that can conduct fluid interactions it is desirable to incorporate cues from separate modalities into turn-taking models.

Attention-based Audio-Visual Fusion for Robust Automatic Speech Recognition

3 code implementations5 Sep 2018 George Sterpu, Christian Saam, Naomi Harte

Automatic speech recognition can potentially benefit from the lip motion patterns, complementing acoustic speech to improve the overall recognition performance, particularly in noise.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

How to Teach DNNs to Pay Attention to the Visual Modality in Speech Recognition

1 code implementation17 Apr 2020 George Sterpu, Christian Saam, Naomi Harte

A recently proposed multimodal fusion strategy, AV Align, based on state-of-the-art sequence to sequence neural networks, attempts to model this relationship by explicitly aligning the acoustic and visual representations of speech.

Audio-Visual Speech Recognition speech-recognition +1

Neural Generation of Dialogue Response Timings

1 code implementation ACL 2020 Matthew Roddy, Naomi Harte

The timings of spoken response offsets in human dialogue have been shown to vary based on contextual elements of the dialogue.

Should we hard-code the recurrence concept or learn it instead ? Exploring the Transformer architecture for Audio-Visual Speech Recognition

1 code implementation19 May 2020 George Sterpu, Christian Saam, Naomi Harte

The audio-visual speech fusion strategy AV Align has shown significant performance improvements in audio-visual speech recognition (AVSR) on the challenging LRS2 dataset.

Audio-Visual Speech Recognition speech-recognition +1

Learning to Count Words in Fluent Speech enables Online Speech Recognition

1 code implementation8 Jun 2020 George Sterpu, Christian Saam, Naomi Harte

Sequence to Sequence models, in particular the Transformer, achieve state of the art results in Automatic Speech Recognition.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Deep Multi-Scale Feature Learning for Defocus Blur Estimation

no code implementations24 Sep 2020 Ali Karaali, Naomi Harte, Claudio Rosito Jung

This paper presents an edge-based defocus blur estimation method from a single defocused image.

Edge Classification

AV Taris: Online Audio-Visual Speech Recognition

1 code implementation14 Dec 2020 George Sterpu, Naomi Harte

In recent years, Automatic Speech Recognition (ASR) technology has approached human-level performance on conversational speech under relatively clean listening conditions.

Action Detection Activity Detection +5

Low Resource Species Agnostic Bird Activity Detection

no code implementations16 Dec 2021 Mark Anderson, John Kennedy, Naomi Harte

This paper explores low resource classifiers and features for the detection of bird activity, suitable for embedded Automatic Recording Units which are typically deployed for long term remote monitoring of bird populations.

Action Detection Activity Detection

Bioacoustic Event Detection with prototypical networks and data augmentation

no code implementations16 Dec 2021 Mark Anderson, Naomi Harte

This report presents deep learning and data augmentation techniques used by a system entered into the Few-Shot Bioacoustic Event Detection for the DCASE2021 Challenge.

Data Augmentation Event Detection +1

Learnable Acoustic Frontends in Bird Activity Detection

no code implementations3 Oct 2022 Mark Anderson, Naomi Harte

Combining this data with species agnostic bird activity detection systems enables the monitoring of activity levels of bird populations.

Action Detection Activity Detection +1

Learnable Frontends that do not Learn: Quantifying Sensitivity to Filterbank Initialisation

no code implementations20 Feb 2023 Mark Anderson, Tomi Kinnunen, Naomi Harte

We show that although performance is overall improved, the filterbanks exhibit strong sensitivity to their initialisation strategy.

Action Detection Activity Detection

RoomReader: A Multimodal Corpus of Online Multiparty Conversational Interactions

no code implementations LREC 2022 Justine Reverdy, Sam O’Connor Russell, Louise Duquenne, Diego Garaialde, Benjamin R. Cowan, Naomi Harte

The corpus was developed within the wider RoomReader Project to explore multimodal cues of conversational engagement and behavioural aspects of collaborative interaction in online environments.

Cannot find the paper you are looking for? You can Submit a new open access paper.