no code implementations • 27 Oct 2024 • Meng Wei, Qianyi Wu, Jianmin Zheng, Hamid Rezatofighi, Jianfei Cai
Previous attempts to regularize 3D Gaussian normals often degrade rendering quality due to the fundamental disconnect between normal vectors and the rendering pipeline in 3DGS-based methods.
1 code implementation • 26 Sep 2024 • Sandika Biswas, Qianyi Wu, Biplab Banerjee, Hamid Rezatofighi
Despite advancements in Neural Implicit models for 3D surface reconstruction, handling dynamic environments with interactions between arbitrary rigid, non-rigid, or deformable entities remains challenging.
no code implementations • 19 Sep 2024 • Boying Li, Zhixi Cai, Yuan-Fang Li, Ian Reid, Hamid Rezatofighi
To address this problem, we introduce a novel hierarchical representation that encodes semantic information in a compact form into 3D Gaussian Splatting, leveraging the capabilities of large language models (LLMs).
no code implementations • 16 Sep 2024 • Zhixi Cai, Cristian Rojas Cardenas, Kevin Leo, Chenyuan Zhang, Kal Backman, Hanbing Li, Boying Li, Mahsa Ghorbanali, Stavya Datta, Lizhen Qu, Julian Gutierrez Santiago, Alexey Ignatiev, Yuan-Fang Li, Mor Vered, Peter J Stuckey, Maria Garcia de la Banda, Hamid Rezatofighi
This paper addresses the problem of autonomous UAV search missions, where a UAV must locate specific Entities of Interest (EOIs) within a time limit, based on brief descriptions in large, hazard-prone environments with keep-out zones.
no code implementations • 7 Aug 2024 • Chenhui Gou, Abdulwahab Felemban, Faizan Farooq Khan, Deyao Zhu, Jianfei Cai, Hamid Rezatofighi, Mohamed Elhoseiny
In our study, we introduce a pixel value prediction task (PVP) to explore "How Well Can Vision Language Models See Image Details?"
no code implementations • 18 Jun 2024 • Ziyu Ma, Chenhui Gou, Hengcan Shi, Bin Sun, Shutao Li, Hamid Rezatofighi, Jianfei Cai
Specifically, DrVideo first transforms a long video into a coarse text-based long document to initially retrieve key frames and then updates the documents with the augmented key frame information.
no code implementations • 8 Apr 2024 • Mahsa Ehsanpour, Ian Reid, Hamid Rezatofighi
The framework uses masked modeling to pre-train the encoder to reconstruct masked human joint trajectories, enabling it to learn generalizable and data efficient representations of motion in human crowded scenes.
no code implementations • CVPR 2024 • Simindokht Jahangard, Zhixi Cai, Shiki Wen, Hamid Rezatofighi
Understanding human social behaviour is crucial in computer vision and robotics.
no code implementations • 6 Apr 2024 • Duy-Tho Le, Hengcan Shi, Jianfei Cai, Hamid Rezatofighi
Diffusion models have recently gained prominence as powerful deep generative models, demonstrating unmatched performance across various domains.
no code implementations • CVPR 2024 • Duy-Tho Le, Chenhui Gou, Stavya Datta, Hengcan Shi, Ian Reid, Jianfei Cai, Hamid Rezatofighi
JRDB-PanoTrack includes (1) various data involving indoor and outdoor crowded scenes, as well as comprehensive 2D and 3D synchronized data modalities; (2) high-quality 2D spatial panoptic segmentation and temporal tracking annotations, with additional 3D label projections for further spatial understanding; (3) diverse object classes for closed- and open-world recognition benchmarks, with OSPA-based metrics for evaluation.
1 code implementation • 19 Mar 2024 • Fucai Ke, Zhixi Cai, Simindokht Jahangard, Weiqing Wang, Pari Delir Haghighi, Hamid Rezatofighi
Recent advances in visual reasoning (VR), particularly with the aid of Large Vision-Language Models (VLMs), show promise but require access to large-scale datasets and face challenges such as high computational costs and limited generalization capabilities.
Ranked #1 on Visual Grounding on RefCOCO+ testA (IoU metric)
no code implementations • 4 Mar 2024 • Wangjie Zhong, Leimin Tian, Duy Tho Le, Hamid Rezatofighi
Social robots often rely on visual perception to understand their users and the environment.
1 code implementation • 7 Dec 2023 • Navid Mohammadi Foumani, Chang Wei Tan, Geoffrey I. Webb, Hamid Rezatofighi, Mahsa Salehi
Our evaluation of Series2Vec on nine large real-world datasets, along with the UCR/UEA archive, shows enhanced performance compared to current state-of-the-art self-supervised techniques for time series.
1 code implementation • 5 Nov 2023 • Saeed Saadatnejad, Yang Gao, Hamid Rezatofighi, Alexandre Alahi
To address this, we introduce a novel dataset for end-to-end trajectory forecasting, facilitating the evaluation of models in scenarios involving less-than-ideal preceding modules such as tracking.
no code implementations • 27 Jul 2023 • Sandika Biswas, Kejie Li, Biplab Banerjee, Subhasis Chaudhuri, Hamid Rezatofighi
This paper proposes using an implicit feature representation of the scene elements to distinguish a physically plausible alignment of humans and objects from an implausible one.
1 code implementation • 12 Apr 2023 • Simindokht Jahangard, Munawar Hayat, Hamid Rezatofighi
These results demonstrate that our proposed method is suitable for real-time robotic applications.
1 code implementation • CVPR 2023 • Tianyu Zhu, Bryce Ferenczi, Pulak Purkait, Tom Drummond, Hamid Rezatofighi, Anton Van Den Hengel
Annotating rotated bounding boxes is such a laborious process that they are not provided in many detection datasets where axis-aligned annotations are used instead.
no code implementations • CVPR 2023 • Islam Nassar, Munawar Hayat, Ehsan Abbasnejad, Hamid Rezatofighi, Gholamreza Haffari
Finally, ProtoCon addresses the poor training signal in the initial phase of training (due to fewer confident predictions) by introducing an auxiliary self-supervised loss.
no code implementations • 25 Jan 2023 • Chamath Abeysinghe, Chris Reid, Hamid Rezatofighi, Bernd Meyer
This approach is built upon a joint-detection-and-tracking framework that is extended by a set of domain discriminator modules integrating an adversarial training strategy in addition to the tracking loss.
no code implementations • ICCV 2023 • Samitha Herath, Basura Fernando, Ehsan Abbasnejad, Munawar Hayat, Shahram Khadivi, Mehrtash Harandi, Hamid Rezatofighi, Gholamreza Haffari
EBL can be used to improve the instance selection for a self-training task on the unlabelled target domain, and 2. alignment and normalizing energy scores can learn domain-invariant representations.
no code implementations • 23 Nov 2022 • Huangying Zhan, Jiyang Zheng, Yi Xu, Ian Reid, Hamid Rezatofighi
We, for the first time, present an RGB-only active vision framework using radiance field representation for active 3D reconstruction and planning in an online manner.
no code implementations • 23 Nov 2022 • Huangying Zhan, Hamid Rezatofighi, Ian Reid
We propose a robotic learning system for autonomous exploration and navigation in unexplored environments.
1 code implementation • CVPR 2023 • Zhixi Cai, Shreya Ghosh, Kalin Stefanov, Abhinav Dhall, Jianfei Cai, Hamid Rezatofighi, Reza Haffari, Munawar Hayat
This paper proposes a self-supervised approach to learn universal facial representations from videos, that can transfer across a variety of facial analysis tasks such as Facial Attribute Recognition (FAR), Facial Expression Recognition (FER), DeepFake Detection (DFD), and Lip Synchronization (LS).
Ranked #1 on Emotion Classification on CMU-MOSEI
1 code implementation • 10 Nov 2022 • Haofei Xu, Jing Zhang, Jianfei Cai, Hamid Rezatofighi, Fisher Yu, DaCheng Tao, Andreas Geiger
We present a unified formulation and model for three motion and 3D perception tasks: optical flow, rectified stereo matching and unrectified stereo depth estimation from posed images.
Ranked #1 on Optical Flow Estimation on Sintel-clean
no code implementations • CVPR 2023 • Edward Vendrow, Duy Tho Le, Jianfei Cai, Hamid Rezatofighi
In crowded human scenes with close-up human-robot interaction and robot navigation, a deep understanding requires reasoning about human motion and body dynamics over time with human body pose estimation and tracking.
1 code implementation • 19 Oct 2022 • Islam Nassar, Munawar Hayat, Ehsan Abbasnejad, Hamid Rezatofighi, Mehrtash Harandi, Gholamreza Haffari
We present LAVA, a simple yet effective method for multi-domain visual transfer learning with limited data.
1 code implementation • 30 Aug 2022 • Edward Vendrow, Satyajit Kumar, Ehsan Adeli, Hamid Rezatofighi
Although there are several previous works targeting the problem of multi-person dynamic pose forecasting, they often model the entire pose sequence as time series (ignoring the underlying relationship between joints) or only output the future pose sequence of one person at a time.
no code implementations • CVPR 2022 • Shuai Li, Yu Kong, Hamid Rezatofighi
This paper concerns the problem of multi-object tracking based on the min-cost flow (MCF) formulation, which is conventionally studied as an instance of linear program.
1 code implementation • 31 Dec 2021 • Duy-Tho Le, Hengcan Shi, Hamid Rezatofighi, Jianfei Cai
Efficiently and accurately detecting people from 3D point cloud data is of great importance in many robotic and autonomous driving applications.
4 code implementations • CVPR 2022 • Haofei Xu, Jing Zhang, Jianfei Cai, Hamid Rezatofighi, DaCheng Tao
Learning-based optical flow estimation has been dominated with the pipeline of cost volume with convolutions for flow regression, which is inherently limited to local correlations and thus is hard to address the long-standing challenge of large displacements.
Ranked #8 on Optical Flow Estimation on Spring
no code implementations • 12 Oct 2021 • Alireza Abedin, Hamid Rezatofighi, Damith C. Ranasinghe
Human activity recognition (HAR) is an important research field in ubiquitous computing where the acquisition of large-scale labeled sensor data is tedious, labor-intensive and time consuming.
Generative Adversarial Network Human Activity Recognition +1
1 code implementation • ICCV 2021 • Kejie Li, Daniel DeTone, Steven Chen, Minh Vo, Ian Reid, Hamid Rezatofighi, Chris Sweeney, Julian Straub, Richard Newcombe
Localizing objects and estimating their extent in 3D is an important step towards high-level 3D scene understanding, which has many applications in Augmented Reality and Robotics.
no code implementations • 1 Jul 2021 • S. Ehsan Mirsadeghi, Ali Royat, Hamid Rezatofighi
In this paper, we propose a novel fully unsupervised semantic segmentation method, the so-called Information Maximization and Adversarial Regularization Segmentation (InMARS).
Ranked #2 on Unsupervised Semantic Segmentation on COCO-Stuff-15
no code implementations • CVPR 2022 • Mahsa Ehsanpour, Fatemeh Saleh, Silvio Savarese, Ian Reid, Hamid Rezatofighi
However, learning to recognise human actions and their social interactions in an unconstrained real-world environment comprising numerous people, with potentially highly unbalanced and long-tailed distributed action labels from a stream of sensory data captured from a mobile robot platform remains a significant challenge, not least owing to the lack of a reflective large-scale dataset.
no code implementations • ICCV 2021 • Vida Adeli, Mahsa Ehsanpour, Ian Reid, Juan Carlos Niebles, Silvio Savarese, Ehsan Adeli, Hamid Rezatofighi
Joint forecasting of human trajectory and pose dynamics is a fundamental building block of various applications ranging from robotics and autonomous driving to surveillance systems.
1 code implementation • 27 Mar 2021 • Tianyu Zhu, Markus Hiller, Mahsa Ehsanpour, Rongkai Ma, Tom Drummond, Ian Reid, Hamid Rezatofighi
Tracking a time-varying indefinite number of objects in a video sequence over time remains a challenge despite recent advances in the field.
no code implementations • 9 Dec 2020 • Kejie Li, Hamid Rezatofighi, Ian Reid
Given a new RGB frame, MOLTR firstly applies a monocular 3D detector to localise objects of interest and extract their shape codes that represent the object shapes in a learned embedding space.
no code implementations • CVPR 2021 • Fatemeh Saleh, Sadegh Aliakbarian, Hamid Rezatofighi, Mathieu Salzmann, Stephen Gould
Despite the recent advances in multiple object tracking (MOT), achieved by joint detection and tracking, dealing with long occlusions remains a challenge.
no code implementations • 8 Aug 2020 • Tran Thien Dat Nguyen, Hamid Rezatofighi, Ba-Ngu Vo, Ba-Tuong Vo, Silvio Savarese, Ian Reid
This paper examines performance evaluation criteria for basic vision tasks involving sets of objects namely, object detection, instance-level segmentation and multi-object tracking.
no code implementations • 14 Jul 2020 • Vida Adeli, Ehsan Adeli, Ian Reid, Juan Carlos Niebles, Hamid Rezatofighi
In this paper, we propose a novel framework to tackle both tasks of human motion (or trajectory) and body skeleton pose forecasting in a unified end-to-end pipeline.
no code implementations • 14 Jul 2020 • Alireza Abedin, Mahsa Ehsanpour, Qinfeng Shi, Hamid Rezatofighi, Damith C. Ranasinghe
Wearables are fundamental to improving our understanding of human activities, especially for an increasing number of healthcare applications from rehabilitation to fine-grained gait analysis.
no code implementations • ECCV 2020 • Mahsa Ehsanpour, Alireza Abedin, Fatemeh Saleh, Javen Shi, Ian Reid, Hamid Rezatofighi
In this paper, we solve the problem of simultaneously grouping people by their social interactions, predicting their individual actions and the social activity of each social group, which we call the social task.
1 code implementation • 19 Mar 2020 • Patrick Dendorfer, Hamid Rezatofighi, Anton Milan, Javen Shi, Daniel Cremers, Ian Reid, Stefan Roth, Konrad Schindler, Laura Leal-Taixé
The benchmark for Multiple Object Tracking, MOTChallenge, was launched with the goal to establish a standardized evaluation of multiple object tracking methods.
Multi-Object Tracking Multiple Object Tracking with Transformer +2
1 code implementation • 19 Feb 2020 • Abhijeet Shenoi, Mihir Patel, JunYoung Gwak, Patrick Goebel, Amir Sadeghian, Hamid Rezatofighi, Roberto Martín-Martín, Silvio Savarese
In this work we present JRMOT, a novel 3D MOT system that integrates information from RGB images and 3D point clouds to achieve real-time, state-of-the-art tracking performance.
Ranked #17 on Multiple Object Tracking on KITTI Test (Online Methods)
no code implementations • 30 Jan 2020 • Hamid Rezatofighi, Tianyu Zhu, Roman Kaskman, Farbod T. Motlagh, Qinfeng Shi, Anton Milan, Daniel Cremers, Laura Leal-Taixé, Ian Reid
In our formulation we define a likelihood for a set distribution represented by a) two discrete distributions defining the set cardinally and permutation variables, and b) a joint distribution over set elements with a fixed cardinality.
1 code implementation • NeurIPS 2019 • Jonathan Kuck, Tri Dao, Hamid Rezatofighi, Ashish Sabharwal, Stefano Ermon
Computing the permanent of a non-negative matrix is a core problem with practical applications ranging from target tracking to statistical thermodynamics.
1 code implementation • 25 Oct 2019 • Roberto Martín-Martín, Mihir Patel, Hamid Rezatofighi, Abhijeet Shenoi, JunYoung Gwak, Eric Frankel, Amir Sadeghian, Silvio Savarese
We present JRDB, a novel egocentric dataset collected from our social mobile manipulator JackRabbot.
no code implementations • 28 Sep 2019 • Yu Liu, Lingqiao Liu, Haokui Zhang, Hamid Rezatofighi, Ian Reid
This paper tackles the problem of video object segmentation.
no code implementations • 10 Jun 2019 • Patrick Dendorfer, Hamid Rezatofighi, Anton Milan, Javen Shi, Daniel Cremers, Ian Reid, Stefan Roth, Konrad Schindler, Laura Leal-Taixe
Standardized benchmarks are crucial for the majority of computer vision applications.
10 code implementations • CVPR 2019 • Hamid Rezatofighi, Nathan Tsoi, JunYoung Gwak, Amir Sadeghian, Ian Reid, Silvio Savarese
By incorporating this generalized $IoU$ ($GIoU$) as a loss into the state-of-the art object detection frameworks, we show a consistent improvement on their performance using both the standard, $IoU$ based, and new, $GIoU$ based, performance measures on popular object detection benchmarks such as PASCAL VOC and MS COCO.
no code implementations • 12 Jan 2019 • Yu Liu, Lingqiao Liu, Hamid Rezatofighi, Thanh-Toan Do, Qinfeng Shi, Ian Reid
As the post-processing step for object detection, non-maximum suppression (GreedyNMS) is widely used in most of the detectors for many years.
1 code implementation • 1 Dec 2018 • Hoa Van Nguyen, Hamid Rezatofighi, David Taggart, Bertram Ostendorf, Damith C. Ranasinghe
We investigate the problem of tracking and planning for a UAV in a task to locate multiple radio-tagged wildlife in a three-dimensional (3D) setting in the context of our TrackerBots research project.
no code implementations • 20 Nov 2018 • Alireza Abedin Varamin, Ehsan Abbasnejad, Qinfeng Shi, Damith Ranasinghe, Hamid Rezatofighi
Automatic recognition of human activities from time-series sensor data (referred to as HAR) is a growing area of research in ubiquitous computing.