no code implementations • 23 Jan 2025 • Petr Grinberg, Ankur Kumar, Surya Koppisetti, Gaurav Bharaj
Adding explanations to audio deepfake detection (ADD) models will boost their real-world application by providing insight on the decision making process.
no code implementations • 9 Oct 2024 • Yi Zhu, Chirag Goel, Surya Koppisetti, Trang Tran, Ankur Kumar, Gaurav Bharaj
Our system SLIM learns the style-linguistics dependency embeddings from various types of bonafide speech using self-supervised contrastive learning.
no code implementations • 26 Jul 2024 • Yi Zhu, Surya Koppisetti, Trang Tran, Gaurav Bharaj
The learned features are then used in complement with standard pretrained acoustic features (e. g., Wav2vec) to learn a classifier on the real and fake classes.
no code implementations • 3 Jul 2024 • Chirag Goel, Surya Koppisetti, Ben Colman, Ali Shahriyari, Gaurav Bharaj
Experiments show that our framework successfully disentangles the bonafide and spoof classes and helps learn better classifiers for the task.
no code implementations • CVPR 2024 • Trevine Oorloff, Surya Koppisetti, Nicolò Bonettini, Divyaraj Solanki, Ben Colman, Yaser Yacoob, Ali Shahriyari, Gaurav Bharaj
We present Audio-Visual Feature Fusion (AVFF), a two-stage cross-modal learning method that explicitly captures the correspondence between the audio and visual modalities for improved deepfake detection.
no code implementations • CVPR 2024 • David Ferman, Pablo Garrido, Gaurav Bharaj
In the supervised learning case, such methods usually rely on 3D landmark datasets derived from 3DMM-based registration that often lack spatial definition alignment, as compared with that chosen by hand-labeled human consensus, e. g., how are eyebrow landmarks defined?
1 code implementation • 31 Jan 2024 • Yue Zhang, Ben Colman, Xiao Guo, Ali Shahriyari, Gaurav Bharaj
To address these challenges, we frame deepfake detection as a Deepfake Detection VQA (DD-VQA) task and model human intuition by providing textual explanations that describe common sense reasons for labeling an image as real or fake.
no code implementations • 24 Jan 2024 • Miao Zhang, Zee Fryer, Ben Colman, Ali Shahriyari, Gaurav Bharaj
To this end, we propose a novel framework to extract comprehensive biases in image datasets based on textual descriptions, a common sense-rich modality.
no code implementations • CVPR 2023 • Chuhan Chen, Matthew O'Toole, Gaurav Bharaj, Pablo Garrido
We build on part-based implicit shape models that decompose a global deformation field into local ones.
no code implementations • CVPR 2023 • Xingzhe He, Gaurav Bharaj, David Ferman, Helge Rhodin, Pablo Garrido
Supervised keypoint localization methods rely on large manually labeled image datasets, where objects can deform, articulate, or occlude.
no code implementations • ICCV 2023 • Berkay Kicanaoglu, Pablo Garrido, Gaurav Bharaj
Such representations along with 3D tracking can be used as self-supervision to train a generator with control over coarse expressions and finer facial attributes.
no code implementations • 19 Mar 2022 • Nisarg A. Shah, Gaurav Bharaj
We present a novel algorithm to reduce tensor compute required by a conditional image generation autoencoder without sacrificing quality of photo-realistic image generation.
no code implementations • 19 Mar 2022 • David Ferman, Gaurav Bharaj
Training a small dataset alongside a large(r) dataset helps with robust learning for the former, and provides a universal mechanism for facial landmark localization for new and/or smaller standard datasets.
no code implementations • 8 Apr 2021 • David Ferman, Gaurav Bharaj
We propose a general purpose approach to detect landmarks with improved temporal consistency, and personalization.
no code implementations • 8 Apr 2021 • Eric Engelhart, Mahsa Elyasi, Gaurav Bharaj
And transformer-based models require significant training data, and do not generalize well, especially for dialects with limited data.
no code implementations • 8 Apr 2021 • Mahsa Elyasi, Gaurav Bharaj
In this work, we propose a novel carefully designed strategy for conditioning Tacotron-2 on two fundamental prosodic features in English -- stress syllable and pitch accent, that help achieve more natural prosody.
1 code implementation • 13 Jan 2021 • Abdallah Dib, Gaurav Bharaj, Junghyun Ahn, Cédric Thébault, Philippe-Henri Gosselin, Marco Romeo, Louis Chevallier
The proposed method models scene illumination via a novel, parameterized virtual light stage, which in-conjunction with differentiable ray-tracing, introduces a coarse-to-fine optimization formulation for face reconstruction.
no code implementations • CVPR 2020 • Ayush Tewari, Mohamed Elgharib, Gaurav Bharaj, Florian Bernard, Hans-Peter Seidel, Patrick Pérez, Michael Zollhöfer, Christian Theobalt
StyleGAN generates photorealistic portrait images of faces with eyes, teeth, hair and context (neck, shoulders, background), but lacks a rig-like control over semantic face parameters that are interpretable in 3D, such as face pose, expressions, and scene illumination.
no code implementations • 3 Oct 2019 • Abdallah Dib, Gaurav Bharaj, Junghyun Ahn, Cedric Thebault, Philippe-Henri Gosselin, Louis Chevallier
We present a novel strategy to automatically reconstruct 3D faces from monocular images with explicitly disentangled facial geometry (pose, identity and expression), reflectance (diffuse and specular albedo), and self-shadows.
no code implementations • CVPR 2019 • Ayush Tewari, Florian Bernard, Pablo Garrido, Gaurav Bharaj, Mohamed Elgharib, Hans-Peter Seidel, Patrick Pérez, Michael Zollhöfer, Christian Theobalt
In contrast, we propose multi-frame video-based self-supervised training of a deep network that (i) learns a face identity model both in shape and appearance while (ii) jointly learning to reconstruct 3D faces.