Search Results for author: Barry-John Theobald

Found 20 papers, 3 papers with code

ESPnet-SPK: full pipeline speaker embedding toolkit with reproducible recipes, self-supervised front-ends, and off-the-shelf models

2 code implementations • 30 Jan 2024 • Jee-weon Jung, Wangyou Zhang, Jiatong Shi, Zakaria Aldeneh, Takuya Higuchi, Barry-John Theobald, Ahmed Hussen Abdelaziz, Shinji Watanabe

First, we provide an open-source platform for researchers in the speaker recognition community to effortlessly build models.

Ranked #1 on Speaker Verification on VoxCeleb (using extra training data)

Self-Supervised Learning Speaker Recognition +1

7,871

Paper
Code

Spatial LibriSpeech: An Augmented Dataset for Spatial Audio Learning

1 code implementation • 18 Aug 2023 • Miguel Sarabia, Elena Menyaylenko, Alessandro Toso, Skyler Seto, Zakaria Aldeneh, Shadi Pirhosseinloo, Luca Zappella, Barry-John Theobald, Nicholas Apostoloff, Jonathan Sheaffer

We present Spatial LibriSpeech, a spatial audio dataset with over 650 hours of 19-channel audio, first-order ambisonics, and optional distractor noise.

8k Position

Paper
Code

Sample-Efficient Preference-based Reinforcement Learning with Dynamics Aware Rewards

1 code implementation • 28 Feb 2024 • Katherine Metcalf, Miguel Sarabia, Natalie Mackraz, Barry-John Theobald

Preference-based reinforcement learning (PbRL) aligns a robot behavior with human preferences via a reward function learned from binary feedback over agent behaviors.

reinforcement-learning

Paper
Code

Which phoneme-to-viseme maps best improve visual-only computer lip-reading?

no code implementations • 3 Oct 2017 • Helen L. Bear, Richard W. Harvey, Barry-John Theobald, Yuxuan Lan

A critical assumption of all current visual speech recognition systems is that there are visual speech units called visemes which can be mapped to units of acoustic speech, the phonemes.

Lip Reading speech-recognition +1

Paper
Add Code

Some observations on computer lip-reading: moving from the dream to the reality

no code implementations • 3 Oct 2017 • Helen L. Bear, Gari Owen, Richard Harvey, Barry-John Theobald

In the quest for greater computer lip-reading performance there are a number of tacit assumptions which are either present in the datasets (high resolution for example) or in the methods (recognition of spoken visual units called visemes for example).

Lip Reading

Paper
Add Code

Resolution limits on visual speech recognition

no code implementations • 3 Oct 2017 • Helen L. Bear, Richard Harvey, Barry-John Theobald, Yuxuan Lan

Visual-only speech recognition is dependent upon a number of factors that can be difficult to control, such as: lighting; identity; motion; emotion and expression.

Lip Reading speech-recognition +1

Paper
Add Code

Learning Sharing Behaviors with Arbitrary Numbers of Agents

no code implementations • 10 Dec 2018 • Katherine Metcalf, Barry-John Theobald, Nicholas Apostoloff

We model the individual behavior for each agent in an interaction and then use a multi-agent fusion model to generate a summary over the expected actions of the group to render the model independent of the number of agents.

Q-Learning

Paper
Add Code

Mirroring to Build Trust in Digital Assistants

no code implementations • 2 Apr 2019 • Katherine Metcalf, Barry-John Theobald, Garrett Weinberg, Robert Lee, Ing-Marie Jonsson, Russ Webb, Nicholas Apostoloff

We describe experiments towards building a conversational digital assistant that considers the preferred conversational style of the user.

Paper
Add Code

Speaker-Independent Speech-Driven Visual Speech Synthesis using Domain-Adapted Acoustic Models

no code implementations • 15 May 2019 • Ahmed Hussen Abdelaziz, Barry-John Theobald, Justin Binder, Gabriele Fanelli, Paul Dixon, Nicholas Apostoloff, Thibaut Weise, Sachin Kajareker

We conclude that visual speech synthesis can significantly benefit from the powerful representation of speech in the ASR acoustic models.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

On the Role of Visual Cues in Audiovisual Speech Enhancement

no code implementations • 25 Apr 2020 • Zakaria Aldeneh, Anushree Prasanna Kumar, Barry-John Theobald, Erik Marchi, Sachin Kajarekar, Devang Naik, Ahmed Hussen Abdelaziz

One byproduct of this finding is that the learned visual embeddings can be used as features for other visual speech applications.

Self-Supervised Learning Speech Enhancement

Paper
Add Code

Modality Dropout for Improved Performance-driven Talking Faces

no code implementations • 27 May 2020 • Ahmed Hussen Abdelaziz, Barry-John Theobald, Paul Dixon, Reinhard Knothe, Nicholas Apostoloff, Sachin Kajareker

We use subjective testing to demonstrate: 1) the improvement of audiovisual-driven animation over the equivalent video-only approach, and 2) the improvement in the animation of speech-related facial movements after introducing modality dropout.

Paper
Add Code

MorphGAN: One-Shot Face Synthesis GAN for Detecting Recognition Bias

no code implementations • 9 Dec 2020 • Nataniel Ruiz, Barry-John Theobald, Anurag Ranjan, Ahmed Hussein Abdelaziz, Nicholas Apostoloff

Images generated using MorphGAN conserve the identity of the person in the original image, and the provided control over head pose and facial expression allows test sets to be created to identify robustness issues of a facial recognition deep network with respect to pose and expression.

Data Augmentation Face Generation +2

Paper
Add Code

Multimodal Punctuation Prediction with Contextual Dropout

no code implementations • 12 Feb 2021 • Andrew Silva, Barry-John Theobald, Nicholas Apostoloff

Automatic speech recognition (ASR) is widely used in consumer electronics.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

FedEmbed: Personalized Private Federated Learning

no code implementations • 18 Feb 2022 • Andrew Silva, Katherine Metcalf, Nicholas Apostoloff, Barry-John Theobald

Federated learning enables the deployment of machine learning to problems for which centralized data collection is impractical.

Federated Learning

Paper
Add Code

On the role of Lip Articulation in Visual Speech Perception

no code implementations • 18 Mar 2022 • Zakaria Aldeneh, Masha Fedzechkina, Skyler Seto, Katherine Metcalf, Miguel Sarabia, Nicholas Apostoloff, Barry-John Theobald

Previous research has shown that traditional metrics used to optimize and assess models for generating lip motion from speech are not a good indicator of subjective opinion of animation quality.

Paper
Add Code

Naturalistic Head Motion Generation from Speech

no code implementations • 26 Oct 2022 • Trisha Mittal, Zakaria Aldeneh, Masha Fedzechkina, Anurag Ranjan, Barry-John Theobald

Synthesizing natural head motion to accompany speech for an embodied conversational agent is necessary for providing a rich interactive experience.

Paper
Add Code

Contrastive Self-Supervised Learning for Skeleton Representations

no code implementations • 10 Nov 2022 • Nico Lingg, Miguel Sarabia, Luca Zappella, Barry-John Theobald

Human skeleton point clouds are commonly used to automatically classify and predict the behaviour of others.

motion prediction Self-Supervised Learning

Paper
Add Code

Rewards Encoding Environment Dynamics Improves Preference-based Reinforcement Learning

no code implementations • 12 Nov 2022 • Katherine Metcalf, Miguel Sarabia, Barry-John Theobald

In this work, we demonstrate that encoding environment dynamics in the reward function (REED) dramatically reduces the number of preference labels required in state-of-the-art preference-based RL frameworks.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Understanding the Robustness of Multi-Exit Models under Common Corruptions

no code implementations • 3 Dec 2022 • Akshay Mehra, Skyler Seto, Navdeep Jaitly, Barry-John Theobald

Furthermore, the lack of calibration increases the inconsistency in the predictions of the model across exits, leading to both inefficient inference and more misclassifications compared with evaluation on in-distribution data.

Paper
Add Code

REALM: Robust Entropy Adaptive Loss Minimization for Improved Single-Sample Test-Time Adaptation

no code implementations • 7 Sep 2023 • Skyler Seto, Barry-John Theobald, Federico Danieli, Navdeep Jaitly, Dan Busbridge

In online F-TTA, a pre-trained model is adapted using a stream of test samples by minimizing a self-supervised objective, such as entropy minimization.

Test-time Adaptation

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.