Search Results for author: Zhixi Cai

Found 7 papers, 4 papers with code

HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning

no code implementations19 Mar 2024 Fucai Ke, Zhixi Cai, Simindokht Jahangard, Weiqing Wang, Pari Delir Haghighi, Hamid Rezatofighi

Recent advances in visual reasoning (VR), particularly with the aid of Large Vision-Language Models (VLMs), show promise but require access to large-scale datasets and face challenges such as high computational costs and limited generalization capabilities.

Reinforcement Learning (RL) Visual Reasoning

AV-Deepfake1M: A Large-Scale LLM-Driven Audio-Visual Deepfake Dataset

1 code implementation26 Nov 2023 Zhixi Cai, Shreya Ghosh, Aman Pankaj Adatia, Munawar Hayat, Abhinav Dhall, Kalin Stefanov

The comprehensive benchmark of the proposed dataset utilizing state-of-the-art deepfake detection and localization methods indicates a significant drop in performance compared to previous datasets.

2k DeepFake Detection +2

Pavlok-Nudge: A Feedback Mechanism for Atomic Behaviour Modification with Snoring Usecase

no code implementations10 May 2023 Shreya Ghosh, Rakibul Hasan, Pradyumna Agrawal, Zhixi Cai, Susannah Soon, Abhinav Dhall, Tom Gedeon

To this end, we design a user interface to generate an automatic feedback mechanism that integrates Pavlok and a deep learning based model to detect certain behaviours via an integrated user interface i. e. mobile or desktop application.

Glitch in the Matrix: A Large Scale Benchmark for Content Driven Audio-Visual Forgery Detection and Localization

1 code implementation3 May 2023 Zhixi Cai, Shreya Ghosh, Abhinav Dhall, Tom Gedeon, Kalin Stefanov, Munawar Hayat

The proposed baseline method, Boundary Aware Temporal Forgery Detection (BA-TFD), is a 3D Convolutional Neural Network-based architecture which effectively captures multimodal manipulations.

Binary Classification DeepFake Detection +2

MARLIN: Masked Autoencoder for facial video Representation LearnINg

1 code implementation CVPR 2023 Zhixi Cai, Shreya Ghosh, Kalin Stefanov, Abhinav Dhall, Jianfei Cai, Hamid Rezatofighi, Reza Haffari, Munawar Hayat

This paper proposes a self-supervised approach to learn universal facial representations from videos, that can transfer across a variety of facial analysis tasks such as Facial Attribute Recognition (FAR), Facial Expression Recognition (FER), DeepFake Detection (DFD), and Lip Synchronization (LS).

Action Classification Attribute +9

Do You Really Mean That? Content Driven Audio-Visual Deepfake Dataset and Multimodal Method for Temporal Forgery Localization

1 code implementation13 Apr 2022 Zhixi Cai, Kalin Stefanov, Abhinav Dhall, Munawar Hayat

Our baseline method for benchmarking the proposed dataset is a 3DCNN model, termed as Boundary Aware Temporal Forgery Detection (BA-TFD), which is guided via contrastive, boundary matching, and frame classification loss functions.

Benchmarking DeepFake Detection +1

Cannot find the paper you are looking for? You can Submit a new open access paper.