1 code implementation • ICML 2020 • Joao Monteiro, Isabela Albuquerque, Jahangir Alam, R. Devon Hjelm, Tiago Falk
In this contribution, we augment the metric learning setting by introducing a parametric pseudo-distance, trained jointly with the encoder.
1 code implementation • 20 Mar 2024 • R. Gnana Praveen, Jahangir Alam
In particular, we compute the attention weights based on cross-correlation between the joint audio-visual-text feature representations and the feature representations of individual modalities to simultaneously capture intra- and intermodal relationships across the modalities.
no code implementations • 7 Nov 2018 • Gautam Bhattacharya, Joao Monteiro, Jahangir Alam, Patrick Kenny
Furthermore, we are able to significantly boost verification performance by averaging our different GAN models at the score level, achieving a relative improvement of 7. 2% over the baseline.
no code implementations • 7 Nov 2018 • Gautam Bhattacharya, Jahangir Alam, Patrick Kenny
In this article we propose a novel approach for adapting speaker embeddings to new domains based on adversarial training of neural networks.
no code implementations • 13 Dec 2019 • Hossein Zeinali, Kong Aik Lee, Jahangir Alam, Lukas Burget
This document describes the Short-duration Speaker Verification (SdSV) Challenge 2021.
no code implementations • 1 Jan 2021 • Joao Monteiro, Isabela Albuquerque, Jahangir Alam, Tiago Falk
Recent metric learning approaches parametrize semantic similarity measures through the use of an encoder trained along with a similarity model, which operates over pairs of representations.
no code implementations • 7 Dec 2021 • Woo Hyun Kang, Jahangir Alam, Abderrahim Fathan
Over the recent years, various deep learning-based methods were proposed for extracting a fixed-dimensional embedding vector from speech signals.
no code implementations • 3 May 2022 • Woo Hyun Kang, Jahangir Alam, Abderrahim Fathan
The main objective of the spoofing countermeasure system is to detect the artifacts within the input speech caused by the speech synthesis or voice conversion process.
no code implementations • 28 Sep 2023 • R. Gnana Praveen, Jahangir Alam
We have shown that efficiently leveraging the intra- and inter-modal relationships significantly improves the performance of audio-visual fusion for speaker verification.
no code implementations • 7 Mar 2024 • R. Gnana Praveen, Jahangir Alam
In this paper, we propose a Dynamic Cross-Attention (DCA) model that can dynamically select the cross-attended or unattended features on the fly based on the strong or weak complementary relationships, respectively, across audio and visual modalities.
no code implementations • 7 Mar 2024 • R. Gnana Praveen, Jahangir Alam
In this paper, we have investigated the prospect of effectively capturing both the intra- and inter-modal relationships across audio and visual modalities, which can play a crucial role in significantly improving the fusion performance over unimodal systems.
no code implementations • 28 Mar 2024 • R. Gnana Praveen, Jahangir Alam
We also compare the proposed approach with other variants of cross-attention and show that the proposed model consistently improves the performance on both datasets.