no code implementations • 21 Jan 2025 • Pha Nguyen, Sailik Sengupta, Girik Malik, Arshit Gupta, Bonan Min
The improved competence of generative models can help building multi-modal virtual assistants that leverage modalities beyond language.
no code implementations • 27 Nov 2024 • Trong-Thuan Nguyen, Pha Nguyen, Jackson Cothren, Alper Yilmaz, Khoa Luu
To this end, we propose Multimodal LLMs on a Scene HyperGraph (HyperGLM), promoting reasoning about multi-way interactions and higher-order relationships.
no code implementations • 14 Oct 2024 • Pha Nguyen, Ngan Le, Jackson Cothren, Alper Yilmaz, Khoa Luu
However, existing diffusion models rely on extensive and unnecessary mapping to a Gaussian noise domain, which can be replaced by a more efficient and stable interpolation process.
no code implementations • 3 Jun 2024 • Trong-Thuan Nguyen, Pha Nguyen, Xin Li, Jackson Cothren, Alper Yilmaz, Khoa Luu
In this paper, we introduce the new AeroEye dataset that focuses on multi-object relationship modeling in aerial videos.
no code implementations • CVPR 2024 • Trong-Thuan Nguyen, Pha Nguyen, Khoa Luu
In this paper, we delve into interactivities understanding within visual content by deriving scene graph representations from dense interactivities among humans and objects.
no code implementations • 28 Nov 2023 • Naga VS Raviteja Chappa, Pha Nguyen, Thi Hoang Ngan Le, Khoa Luu
Flow-Attention incorporates flow conservation principles, fostering competition for sources and allocation for sinks, effectively preventing the generation of trivial attention.
no code implementations • 27 Nov 2023 • Naga VS Raviteja Chappa, Pha Nguyen, Page Daniel Dobbs, Khoa Luu
Group Activity Recognition (GAR) is a fundamental problem in computer vision, with diverse applications in sports video analysis, video surveillance, and social scene understanding.
no code implementations • 16 Jun 2023 • Pha Nguyen, Kha Gia Quach, John Gauch, Samee U. Khan, Bhiksha Raj, Khoa Luu
Then, a new cross-domain MOT adaptation from existing datasets is proposed without any pre-defined human knowledge in understanding and modeling objects.
no code implementations • 28 May 2023 • Kim Hoang Tran, Anh Duy Le Dinh, Tien Phat Nguyen, Thinh Phan, Pha Nguyen, Khoa Luu, Donald Adjeroh, Gianfranco Doretto, Ngan Hoang Le
Our contributions are benchmarked through extensive experiments conducted on the Referring GMOT dataset for GMOT task.
no code implementations • NeurIPS 2023 • Pha Nguyen, Kha Gia Quach, Kris Kitani, Khoa Luu
This paper introduces a novel paradigm for Multiple Object Tracking called Type-to-Track, which allows users to track objects in videos by typing natural language descriptions.
Grounded Multiple Object Tracking
Multiple Object Tracking
+1
no code implementations • 27 Apr 2023 • Naga VS Raviteja Chappa, Pha Nguyen, Alexander H Nelson, Han-Seok Seo, Xin Li, Page Daniel Dobbs, Khoa Luu
This paper introduces a novel approach to Social Group Activity Recognition (SoGAR) using Self-supervised Transformers network that can effectively utilize unlabeled video data.
1 code implementation • 6 Mar 2023 • Naga VS Raviteja Chappa, Pha Nguyen, Alexander H Nelson, Han-Seok Seo, Xin Li, Page Daniel Dobbs, Khoa Luu
In this paper, we propose a new, simple, and effective Self-supervised Spatio-temporal Transformers (SPARTAN) approach to Group Activity Recognition (GAR) using unlabeled video data.
no code implementations • 17 Nov 2022 • Pha Nguyen, Kha Gia Quach, Chi Nhan Duong, Son Lam Phung, Ngan Le, Khoa Luu
The development of autonomous vehicles generates a tremendous demand for a low-cost solution with a complete set of camera sensors capturing the environment around the car.
no code implementations • 10 Jul 2022 • Kha Gia Quach, Huu Le, Pha Nguyen, Chi Nhan Duong, Tien Dai Bui, Khoa Luu
This paper aims to tackle Multiple Object Tracking (MOT), an important problem in computer vision but remains challenging due to many practical issues, especially occlusions.
no code implementations • 7 Jun 2022 • Pha Nguyen, Thanh-Dat Truong, Miaoqing Huang, Yi Liang, Ngan Le, Khoa Luu
Self-training crowd counting has not been attentively explored though it is one of the important challenges in computer vision.
no code implementations • 19 Apr 2022 • Pha Nguyen, Kha Gia Quach, Chi Nhan Duong, Ngan Le, Xuan-Bac Nguyen, Khoa Luu
The experimental results on the nuScenes dataset demonstrate the benefits of the proposed method to produce SOTA performance on the existing vision-based tracking dataset.
1 code implementation • CVPR 2021 • Kha Gia Quach, Pha Nguyen, Huu Le, Thanh-Dat Truong, Chi Nhan Duong, Minh-Triet Tran, Khoa Luu
Multi-Camera Multiple Object Tracking (MC-MOT) is a significant computer vision problem due to its emerging applicability in several real-world applications.