no code implementations • 19 Sep 2024 • Boying Li, Zhixi Cai, Yuan-Fang Li, Ian Reid, Hamid Rezatofighi
To address this problem, we introduce a novel hierarchical representation that encodes semantic information in a compact form into 3D Gaussian Splatting, leveraging the capabilities of large language models (LLMs).
no code implementations • 16 Sep 2024 • Zhixi Cai, Cristian Rojas Cardenas, Kevin Leo, Chenyuan Zhang, Kal Backman, Hanbing Li, Boying Li, Mahsa Ghorbanali, Stavya Datta, Lizhen Qu, Julian Gutierrez Santiago, Alexey Ignatiev, Yuan-Fang Li, Mor Vered, Peter J Stuckey, Maria Garcia de la Banda, Hamid Rezatofighi
This paper addresses the problem of autonomous UAV search missions, where a UAV must locate specific Entities of Interest (EOIs) within a time limit, based on brief descriptions in large, hazard-prone environments with keep-out zones.
no code implementations • 11 Sep 2024 • Shreya Ghosh, Zhixi Cai, Abhinav Dhall, Dimitrios Kollias, Roland Goecke, Tom Gedeon
With the rapid advancements in multimodal generative technology, Affective Computing research has provoked discussion about the potential consequences of AI systems equipped with emotional intelligence.
1 code implementation • 11 Sep 2024 • Zhixi Cai, Abhinav Dhall, Shreya Ghosh, Munawar Hayat, Dimitrios Kollias, Kalin Stefanov, Usman Tariq
The detection and localization of deepfake content, particularly when small fake segments are seamlessly mixed with real videos, remains a significant challenge in the field of digital media security.
no code implementations • CVPR 2024 • Simindokht Jahangard, Zhixi Cai, Shiki Wen, Hamid Rezatofighi
Understanding human social behaviour is crucial in computer vision and robotics.
1 code implementation • 19 Mar 2024 • Fucai Ke, Zhixi Cai, Simindokht Jahangard, Weiqing Wang, Pari Delir Haghighi, Hamid Rezatofighi
Recent advances in visual reasoning (VR), particularly with the aid of Large Vision-Language Models (VLMs), show promise but require access to large-scale datasets and face challenges such as high computational costs and limited generalization capabilities.
Ranked #1 on Visual Grounding on RefCOCO+ testA (IoU metric)
1 code implementation • 26 Nov 2023 • Zhixi Cai, Shreya Ghosh, Aman Pankaj Adatia, Munawar Hayat, Abhinav Dhall, Tom Gedeon, Kalin Stefanov
The comprehensive benchmark of the proposed dataset utilizing state-of-the-art deepfake detection and localization methods indicates a significant drop in performance compared to previous datasets.
1 code implementation • 10 May 2023 • Md Rakibul Hasan, Shreya Ghosh, Pradyumna Agrawal, Zhixi Cai, Abhinav Dhall, Tom Gedeon
This paper proposes a feedback mechanism to change behavioural patterns using the Pavlok device.
1 code implementation • 3 May 2023 • Zhixi Cai, Shreya Ghosh, Abhinav Dhall, Tom Gedeon, Kalin Stefanov, Munawar Hayat
The proposed baseline method, Boundary Aware Temporal Forgery Detection (BA-TFD), is a 3D Convolutional Neural Network-based architecture which effectively captures multimodal manipulations.
Ranked #1 on Temporal Forgery Localization on ForgeryNet
1 code implementation • CVPR 2023 • Zhixi Cai, Shreya Ghosh, Kalin Stefanov, Abhinav Dhall, Jianfei Cai, Hamid Rezatofighi, Reza Haffari, Munawar Hayat
This paper proposes a self-supervised approach to learn universal facial representations from videos, that can transfer across a variety of facial analysis tasks such as Facial Attribute Recognition (FAR), Facial Expression Recognition (FER), DeepFake Detection (DFD), and Lip Synchronization (LS).
Ranked #1 on Emotion Classification on CMU-MOSEI
1 code implementation • 13 Apr 2022 • Zhixi Cai, Kalin Stefanov, Abhinav Dhall, Munawar Hayat
Our baseline method for benchmarking the proposed dataset is a 3DCNN model, termed as Boundary Aware Temporal Forgery Detection (BA-TFD), which is guided via contrastive, boundary matching, and frame classification loss functions.
Ranked #1 on DeepFake Detection on LAV-DF