2 code implementations • 30 Jan 2024 • Jee-weon Jung, Wangyou Zhang, Jiatong Shi, Zakaria Aldeneh, Takuya Higuchi, Barry-John Theobald, Ahmed Hussen Abdelaziz, Shinji Watanabe
First, we provide an open-source platform for researchers in the speaker recognition community to effortlessly build models.
Ranked #1 on Speaker Verification on VoxCeleb (using extra training data)
1 code implementation • 18 Aug 2023 • Miguel Sarabia, Elena Menyaylenko, Alessandro Toso, Skyler Seto, Zakaria Aldeneh, Shadi Pirhosseinloo, Luca Zappella, Barry-John Theobald, Nicholas Apostoloff, Jonathan Sheaffer
We present Spatial LibriSpeech, a spatial audio dataset with over 650 hours of 19-channel audio, first-order ambisonics, and optional distractor noise.
1 code implementation • 28 Feb 2024 • Katherine Metcalf, Miguel Sarabia, Natalie Mackraz, Barry-John Theobald
Preference-based reinforcement learning (PbRL) aligns a robot behavior with human preferences via a reward function learned from binary feedback over agent behaviors.
no code implementations • 3 Oct 2017 • Helen L. Bear, Richard W. Harvey, Barry-John Theobald, Yuxuan Lan
A critical assumption of all current visual speech recognition systems is that there are visual speech units called visemes which can be mapped to units of acoustic speech, the phonemes.
no code implementations • 3 Oct 2017 • Helen L. Bear, Gari Owen, Richard Harvey, Barry-John Theobald
In the quest for greater computer lip-reading performance there are a number of tacit assumptions which are either present in the datasets (high resolution for example) or in the methods (recognition of spoken visual units called visemes for example).
no code implementations • 3 Oct 2017 • Helen L. Bear, Richard Harvey, Barry-John Theobald, Yuxuan Lan
Visual-only speech recognition is dependent upon a number of factors that can be difficult to control, such as: lighting; identity; motion; emotion and expression.
no code implementations • 10 Dec 2018 • Katherine Metcalf, Barry-John Theobald, Nicholas Apostoloff
We model the individual behavior for each agent in an interaction and then use a multi-agent fusion model to generate a summary over the expected actions of the group to render the model independent of the number of agents.
no code implementations • 2 Apr 2019 • Katherine Metcalf, Barry-John Theobald, Garrett Weinberg, Robert Lee, Ing-Marie Jonsson, Russ Webb, Nicholas Apostoloff
We describe experiments towards building a conversational digital assistant that considers the preferred conversational style of the user.
no code implementations • 15 May 2019 • Ahmed Hussen Abdelaziz, Barry-John Theobald, Justin Binder, Gabriele Fanelli, Paul Dixon, Nicholas Apostoloff, Thibaut Weise, Sachin Kajareker
We conclude that visual speech synthesis can significantly benefit from the powerful representation of speech in the ASR acoustic models.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 25 Apr 2020 • Zakaria Aldeneh, Anushree Prasanna Kumar, Barry-John Theobald, Erik Marchi, Sachin Kajarekar, Devang Naik, Ahmed Hussen Abdelaziz
One byproduct of this finding is that the learned visual embeddings can be used as features for other visual speech applications.
no code implementations • 27 May 2020 • Ahmed Hussen Abdelaziz, Barry-John Theobald, Paul Dixon, Reinhard Knothe, Nicholas Apostoloff, Sachin Kajareker
We use subjective testing to demonstrate: 1) the improvement of audiovisual-driven animation over the equivalent video-only approach, and 2) the improvement in the animation of speech-related facial movements after introducing modality dropout.
no code implementations • 9 Dec 2020 • Nataniel Ruiz, Barry-John Theobald, Anurag Ranjan, Ahmed Hussein Abdelaziz, Nicholas Apostoloff
Images generated using MorphGAN conserve the identity of the person in the original image, and the provided control over head pose and facial expression allows test sets to be created to identify robustness issues of a facial recognition deep network with respect to pose and expression.
no code implementations • 12 Feb 2021 • Andrew Silva, Barry-John Theobald, Nicholas Apostoloff
Automatic speech recognition (ASR) is widely used in consumer electronics.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 18 Feb 2022 • Andrew Silva, Katherine Metcalf, Nicholas Apostoloff, Barry-John Theobald
Federated learning enables the deployment of machine learning to problems for which centralized data collection is impractical.
no code implementations • 18 Mar 2022 • Zakaria Aldeneh, Masha Fedzechkina, Skyler Seto, Katherine Metcalf, Miguel Sarabia, Nicholas Apostoloff, Barry-John Theobald
Previous research has shown that traditional metrics used to optimize and assess models for generating lip motion from speech are not a good indicator of subjective opinion of animation quality.
no code implementations • 26 Oct 2022 • Trisha Mittal, Zakaria Aldeneh, Masha Fedzechkina, Anurag Ranjan, Barry-John Theobald
Synthesizing natural head motion to accompany speech for an embodied conversational agent is necessary for providing a rich interactive experience.
no code implementations • 10 Nov 2022 • Nico Lingg, Miguel Sarabia, Luca Zappella, Barry-John Theobald
Human skeleton point clouds are commonly used to automatically classify and predict the behaviour of others.
no code implementations • 12 Nov 2022 • Katherine Metcalf, Miguel Sarabia, Barry-John Theobald
In this work, we demonstrate that encoding environment dynamics in the reward function (REED) dramatically reduces the number of preference labels required in state-of-the-art preference-based RL frameworks.
no code implementations • 3 Dec 2022 • Akshay Mehra, Skyler Seto, Navdeep Jaitly, Barry-John Theobald
Furthermore, the lack of calibration increases the inconsistency in the predictions of the model across exits, leading to both inefficient inference and more misclassifications compared with evaluation on in-distribution data.
no code implementations • 7 Sep 2023 • Skyler Seto, Barry-John Theobald, Federico Danieli, Navdeep Jaitly, Dan Busbridge
In online F-TTA, a pre-trained model is adapted using a stream of test samples by minimizing a self-supervised objective, such as entropy minimization.