Search Results for author: Zhixi Cai

Found 11 papers, 7 papers with code

Hi-SLAM: Scaling-up Semantics in SLAM with a Hierarchically Categorical Gaussian Splatting

no code implementations19 Sep 2024 Boying Li, Zhixi Cai, Yuan-Fang Li, Ian Reid, Hamid Rezatofighi

To address this problem, we introduce a novel hierarchical representation that encodes semantic information in a compact form into 3D Gaussian Splatting, leveraging the capabilities of large language models (LLMs).

Scene Understanding Semantic Segmentation +1

MRAC Track 1: 2nd Workshop on Multimodal, Generative and Responsible Affective Computing

no code implementations11 Sep 2024 Shreya Ghosh, Zhixi Cai, Abhinav Dhall, Dimitrios Kollias, Roland Goecke, Tom Gedeon

With the rapid advancements in multimodal generative technology, Affective Computing research has provoked discussion about the potential consequences of AI systems equipped with emotional intelligence.

Emotional Intelligence

1M-Deepfakes Detection Challenge

1 code implementation11 Sep 2024 Zhixi Cai, Abhinav Dhall, Shreya Ghosh, Munawar Hayat, Dimitrios Kollias, Kalin Stefanov, Usman Tariq

The detection and localization of deepfake content, particularly when small fake segments are seamlessly mixed with real videos, remains a significant challenge in the field of digital media security.

DeepFake Detection Face Swapping

HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning

1 code implementation19 Mar 2024 Fucai Ke, Zhixi Cai, Simindokht Jahangard, Weiqing Wang, Pari Delir Haghighi, Hamid Rezatofighi

Recent advances in visual reasoning (VR), particularly with the aid of Large Vision-Language Models (VLMs), show promise but require access to large-scale datasets and face challenges such as high computational costs and limited generalization capabilities.

 Ranked #1 on Visual Grounding on RefCOCO+ testA (IoU metric)

Reinforcement Learning (RL) Visual Grounding +2

AV-Deepfake1M: A Large-Scale LLM-Driven Audio-Visual Deepfake Dataset

1 code implementation26 Nov 2023 Zhixi Cai, Shreya Ghosh, Aman Pankaj Adatia, Munawar Hayat, Abhinav Dhall, Tom Gedeon, Kalin Stefanov

The comprehensive benchmark of the proposed dataset utilizing state-of-the-art deepfake detection and localization methods indicates a significant drop in performance compared to previous datasets.

2k DeepFake Detection +2

Glitch in the Matrix: A Large Scale Benchmark for Content Driven Audio-Visual Forgery Detection and Localization

1 code implementation3 May 2023 Zhixi Cai, Shreya Ghosh, Abhinav Dhall, Tom Gedeon, Kalin Stefanov, Munawar Hayat

The proposed baseline method, Boundary Aware Temporal Forgery Detection (BA-TFD), is a 3D Convolutional Neural Network-based architecture which effectively captures multimodal manipulations.

Binary Classification DeepFake Detection +2

MARLIN: Masked Autoencoder for facial video Representation LearnINg

1 code implementation CVPR 2023 Zhixi Cai, Shreya Ghosh, Kalin Stefanov, Abhinav Dhall, Jianfei Cai, Hamid Rezatofighi, Reza Haffari, Munawar Hayat

This paper proposes a self-supervised approach to learn universal facial representations from videos, that can transfer across a variety of facial analysis tasks such as Facial Attribute Recognition (FAR), Facial Expression Recognition (FER), DeepFake Detection (DFD), and Lip Synchronization (LS).

Action Classification Attribute +9

Do You Really Mean That? Content Driven Audio-Visual Deepfake Dataset and Multimodal Method for Temporal Forgery Localization

1 code implementation13 Apr 2022 Zhixi Cai, Kalin Stefanov, Abhinav Dhall, Munawar Hayat

Our baseline method for benchmarking the proposed dataset is a 3DCNN model, termed as Boundary Aware Temporal Forgery Detection (BA-TFD), which is guided via contrastive, boundary matching, and frame classification loss functions.

Benchmarking DeepFake Detection +1

Cannot find the paper you are looking for? You can Submit a new open access paper.