Search Results for author: Hamid Rezatofighi

Found 53 papers, 20 papers with code

Normal-GS: 3D Gaussian Splatting with Normal-Involved Rendering

no code implementations27 Oct 2024 Meng Wei, Qianyi Wu, Jianmin Zheng, Hamid Rezatofighi, Jianfei Cai

Previous attempts to regularize 3D Gaussian normals often degrade rendering quality due to the fundamental disconnect between normal vectors and the rendering pipeline in 3DGS-based methods.

Novel View Synthesis

TFS-NeRF: Template-Free NeRF for Semantic 3D Reconstruction of Dynamic Scene

1 code implementation26 Sep 2024 Sandika Biswas, Qianyi Wu, Biplab Banerjee, Hamid Rezatofighi

Despite advancements in Neural Implicit models for 3D surface reconstruction, handling dynamic environments with interactions between arbitrary rigid, non-rigid, or deformable entities remains challenging.

3D Reconstruction Optical Flow Estimation +1

Hi-SLAM: Scaling-up Semantics in SLAM with a Hierarchically Categorical Gaussian Splatting

no code implementations19 Sep 2024 Boying Li, Zhixi Cai, Yuan-Fang Li, Ian Reid, Hamid Rezatofighi

To address this problem, we introduce a novel hierarchical representation that encodes semantic information in a compact form into 3D Gaussian Splatting, leveraging the capabilities of large language models (LLMs).

Scene Understanding Semantic Segmentation +1

How Well Can Vision Language Models See Image Details?

no code implementations7 Aug 2024 Chenhui Gou, Abdulwahab Felemban, Faizan Farooq Khan, Deyao Zhu, Jianfei Cai, Hamid Rezatofighi, Mohamed Elhoseiny

In our study, we introduce a pixel value prediction task (PVP) to explore "How Well Can Vision Language Models See Image Details?"

Decision Making Image Segmentation +5

DrVideo: Document Retrieval Based Long Video Understanding

no code implementations18 Jun 2024 Ziyu Ma, Chenhui Gou, Hengcan Shi, Bin Sun, Shutao Li, Hamid Rezatofighi, Jianfei Cai

Specifically, DrVideo first transforms a long video into a coarse text-based long document to initially retrieve key frames and then updates the documents with the augmented key frame information.

document understanding EgoSchema +4

Social-MAE: Social Masked Autoencoder for Multi-person Motion Representation Learning

no code implementations8 Apr 2024 Mahsa Ehsanpour, Ian Reid, Hamid Rezatofighi

The framework uses masked modeling to pre-train the encoder to reconstruct masked human joint trajectories, enabling it to learn generalizable and data efficient representations of motion in human crowded scenes.

Action Understanding Decoder +2

DifFUSER: Diffusion Model for Robust Multi-Sensor Fusion in 3D Object Detection and BEV Segmentation

no code implementations6 Apr 2024 Duy-Tho Le, Hengcan Shi, Jianfei Cai, Hamid Rezatofighi

Diffusion models have recently gained prominence as powerful deep generative models, demonstrating unmatched performance across various domains.

3D Object Detection Denoising +2

JRDB-PanoTrack: An Open-world Panoptic Segmentation and Tracking Robotic Dataset in Crowded Human Environments

no code implementations CVPR 2024 Duy-Tho Le, Chenhui Gou, Stavya Datta, Hengcan Shi, Ian Reid, Jianfei Cai, Hamid Rezatofighi

JRDB-PanoTrack includes (1) various data involving indoor and outdoor crowded scenes, as well as comprehensive 2D and 3D synchronized data modalities; (2) high-quality 2D spatial panoptic segmentation and temporal tracking annotations, with additional 3D label projections for further spatial understanding; (3) diverse object classes for closed- and open-world recognition benchmarks, with OSPA-based metrics for evaluation.

Decision Making Panoptic Segmentation +1

HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning

1 code implementation19 Mar 2024 Fucai Ke, Zhixi Cai, Simindokht Jahangard, Weiqing Wang, Pari Delir Haghighi, Hamid Rezatofighi

Recent advances in visual reasoning (VR), particularly with the aid of Large Vision-Language Models (VLMs), show promise but require access to large-scale datasets and face challenges such as high computational costs and limited generalization capabilities.

 Ranked #1 on Visual Grounding on RefCOCO+ testA (IoU metric)

Reinforcement Learning (RL) Visual Grounding +2

Series2Vec: Similarity-based Self-supervised Representation Learning for Time Series Classification

1 code implementation7 Dec 2023 Navid Mohammadi Foumani, Chang Wei Tan, Geoffrey I. Webb, Hamid Rezatofighi, Mahsa Salehi

Our evaluation of Series2Vec on nine large real-world datasets, along with the UCR/UEA archive, shows enhanced performance compared to current state-of-the-art self-supervised techniques for time series.

Data Augmentation Representation Learning +4

JRDB-Traj: A Dataset and Benchmark for Trajectory Forecasting in Crowds

1 code implementation5 Nov 2023 Saeed Saadatnejad, Yang Gao, Hamid Rezatofighi, Alexandre Alahi

To address this, we introduce a novel dataset for end-to-end trajectory forecasting, facilitating the evaluation of models in scenarios involving less-than-ideal preceding modules such as tracking.

Autonomous Navigation Benchmarking +1

Physically Plausible 3D Human-Scene Reconstruction from Monocular RGB Image using an Adversarial Learning Approach

no code implementations27 Jul 2023 Sandika Biswas, Kejie Li, Biplab Banerjee, Subhasis Chaudhuri, Hamid Rezatofighi

This paper proposes using an implicit feature representation of the scene elements to distinguish a physically plausible alignment of humans and objects from an implausible one.

3D Reconstruction Robot Navigation

Real-time Trajectory-based Social Group Detection

1 code implementation12 Apr 2023 Simindokht Jahangard, Munawar Hayat, Hamid Rezatofighi

These results demonstrate that our proposed method is suitable for real-time robotic applications.

Graph Clustering Robot Navigation

Knowledge Combination to Learn Rotated Detection Without Rotated Annotation

1 code implementation CVPR 2023 Tianyu Zhu, Bryce Ferenczi, Pulak Purkait, Tom Drummond, Hamid Rezatofighi, Anton Van Den Hengel

Annotating rotated bounding boxes is such a laborious process that they are not provided in many detection datasets where axis-aligned annotations are used instead.

ProtoCon: Pseudo-label Refinement via Online Clustering and Prototypical Consistency for Efficient Semi-supervised Learning

no code implementations CVPR 2023 Islam Nassar, Munawar Hayat, Ehsan Abbasnejad, Hamid Rezatofighi, Gholamreza Haffari

Finally, ProtoCon addresses the poor training signal in the initial phase of training (due to fewer confident predictions) by introducing an auxiliary self-supervised loss.

Online Clustering Pseudo Label

Tracking Different Ant Species: An Unsupervised Domain Adaptation Framework and a Dataset for Multi-object Tracking

no code implementations25 Jan 2023 Chamath Abeysinghe, Chris Reid, Hamid Rezatofighi, Bernd Meyer

This approach is built upon a joint-detection-and-tracking framework that is extended by a set of domain discriminator modules integrating an adversarial training strategy in addition to the tracking loss.

Multi-Object Tracking Unsupervised Domain Adaptation

Energy-based Self-Training and Normalization for Unsupervised Domain Adaptation

no code implementations ICCV 2023 Samitha Herath, Basura Fernando, Ehsan Abbasnejad, Munawar Hayat, Shahram Khadivi, Mehrtash Harandi, Hamid Rezatofighi, Gholamreza Haffari

EBL can be used to improve the instance selection for a self-training task on the unlabelled target domain, and 2. alignment and normalizing energy scores can learn domain-invariant representations.

Unsupervised Domain Adaptation

ActiveRMAP: Radiance Field for Active Mapping And Planning

no code implementations23 Nov 2022 Huangying Zhan, Jiyang Zheng, Yi Xu, Ian Reid, Hamid Rezatofighi

We, for the first time, present an RGB-only active vision framework using radiance field representation for active 3D reconstruction and planning in an online manner.

3D Reconstruction Active 3D Reconstruction

Predicting Topological Maps for Visual Navigation in Unexplored Environments

no code implementations23 Nov 2022 Huangying Zhan, Hamid Rezatofighi, Ian Reid

We propose a robotic learning system for autonomous exploration and navigation in unexplored environments.

Visual Navigation

MARLIN: Masked Autoencoder for facial video Representation LearnINg

1 code implementation CVPR 2023 Zhixi Cai, Shreya Ghosh, Kalin Stefanov, Abhinav Dhall, Jianfei Cai, Hamid Rezatofighi, Reza Haffari, Munawar Hayat

This paper proposes a self-supervised approach to learn universal facial representations from videos, that can transfer across a variety of facial analysis tasks such as Facial Attribute Recognition (FAR), Facial Expression Recognition (FER), DeepFake Detection (DFD), and Lip Synchronization (LS).

Action Classification Attribute +9

Unifying Flow, Stereo and Depth Estimation

1 code implementation10 Nov 2022 Haofei Xu, Jing Zhang, Jianfei Cai, Hamid Rezatofighi, Fisher Yu, DaCheng Tao, Andreas Geiger

We present a unified formulation and model for three motion and 3D perception tasks: optical flow, rectified stereo matching and unrectified stereo depth estimation from posed images.

Optical Flow Estimation Stereo Depth Estimation +1

JRDB-Pose: A Large-scale Dataset for Multi-Person Pose Estimation and Tracking

no code implementations CVPR 2023 Edward Vendrow, Duy Tho Le, Jianfei Cai, Hamid Rezatofighi

In crowded human scenes with close-up human-robot interaction and robot navigation, a deep understanding requires reasoning about human motion and body dynamics over time with human body pose estimation and tracking.

Diversity Multi-Person Pose Estimation +2

SoMoFormer: Multi-Person Pose Forecasting with Transformers

1 code implementation30 Aug 2022 Edward Vendrow, Satyajit Kumar, Ehsan Adeli, Hamid Rezatofighi

Although there are several previous works targeting the problem of multi-person dynamic pose forecasting, they often model the entire pose sequence as time series (ignoring the underlying relationship between joints) or only output the future pose sequence of one person at a time.

Human Pose Forecasting motion prediction +2

Learning of Global Objective for Network Flow in Multi-Object Tracking

no code implementations CVPR 2022 Shuai Li, Yu Kong, Hamid Rezatofighi

This paper concerns the problem of multi-object tracking based on the min-cost flow (MCF) formulation, which is conventionally studied as an instance of linear program.

Multi-Object Tracking

GMFlow: Learning Optical Flow via Global Matching

4 code implementations CVPR 2022 Haofei Xu, Jing Zhang, Jianfei Cai, Hamid Rezatofighi, DaCheng Tao

Learning-based optical flow estimation has been dominated with the pipeline of cost volume with convolutions for flow regression, which is inherently limited to local correlations and thus is hard to address the long-standing challenge of large displacements.

Optical Flow Estimation regression

Guided-GAN: Adversarial Representation Learning for Activity Recognition with Wearables

no code implementations12 Oct 2021 Alireza Abedin, Hamid Rezatofighi, Damith C. Ranasinghe

Human activity recognition (HAR) is an important research field in ubiquitous computing where the acquisition of large-scale labeled sensor data is tedious, labor-intensive and time consuming.

Generative Adversarial Network Human Activity Recognition +1

ODAM: Object Detection, Association, and Mapping using Posed RGB Video

1 code implementation ICCV 2021 Kejie Li, Daniel DeTone, Steven Chen, Minh Vo, Ian Reid, Hamid Rezatofighi, Chris Sweeney, Julian Straub, Richard Newcombe

Localizing objects and estimating their extent in 3D is an important step towards high-level 3D scene understanding, which has many applications in Augmented Reality and Robotics.

3D Object Detection Graph Neural Network +3

Unsupervised Image Segmentation by Mutual Information Maximization and Adversarial Regularization

no code implementations1 Jul 2021 S. Ehsan Mirsadeghi, Ali Royat, Hamid Rezatofighi

In this paper, we propose a novel fully unsupervised semantic segmentation method, the so-called Information Maximization and Adversarial Regularization Segmentation (InMARS).

Image Segmentation Scene Understanding +4

JRDB-Act: A Large-scale Dataset for Spatio-temporal Action, Social Group and Activity Detection

no code implementations CVPR 2022 Mahsa Ehsanpour, Fatemeh Saleh, Silvio Savarese, Ian Reid, Hamid Rezatofighi

However, learning to recognise human actions and their social interactions in an unconstrained real-world environment comprising numerous people, with potentially highly unbalanced and long-tailed distributed action labels from a stream of sensory data captured from a mobile robot platform remains a significant challenge, not least owing to the lack of a reflective large-scale dataset.

Action Detection Action Understanding +1

TRiPOD: Human Trajectory and Pose Dynamics Forecasting in the Wild

no code implementations ICCV 2021 Vida Adeli, Mahsa Ehsanpour, Ian Reid, Juan Carlos Niebles, Silvio Savarese, Ehsan Adeli, Hamid Rezatofighi

Joint forecasting of human trajectory and pose dynamics is a fundamental building block of various applications ranging from robotics and autonomous driving to surveillance systems.

Autonomous Driving Human-Object Interaction Detection

Looking Beyond Two Frames: End-to-End Multi-Object Tracking Using Spatial and Temporal Transformers

1 code implementation27 Mar 2021 Tianyu Zhu, Markus Hiller, Mahsa Ehsanpour, Rongkai Ma, Tom Drummond, Ian Reid, Hamid Rezatofighi

Tracking a time-varying indefinite number of objects in a video sequence over time remains a challenge despite recent advances in the field.

Multi-Object Tracking Object +1

MOLTR: Multiple Object Localisation, Tracking, and Reconstruction from Monocular RGB Videos

no code implementations9 Dec 2020 Kejie Li, Hamid Rezatofighi, Ian Reid

Given a new RGB frame, MOLTR firstly applies a monocular 3D detector to localise objects of interest and extract their shape codes that represent the object shapes in a learned embedding space.

Benchmarking Object +1

Probabilistic Tracklet Scoring and Inpainting for Multiple Object Tracking

no code implementations CVPR 2021 Fatemeh Saleh, Sadegh Aliakbarian, Hamid Rezatofighi, Mathieu Salzmann, Stephen Gould

Despite the recent advances in multiple object tracking (MOT), achieved by joint detection and tracking, dealing with long occlusions remains a challenge.

Multiple Object Tracking

How Trustworthy are Performance Evaluations for Basic Vision Tasks?

no code implementations8 Aug 2020 Tran Thien Dat Nguyen, Hamid Rezatofighi, Ba-Ngu Vo, Ba-Tuong Vo, Silvio Savarese, Ian Reid

This paper examines performance evaluation criteria for basic vision tasks involving sets of objects namely, object detection, instance-level segmentation and multi-object tracking.

Multi-Object Tracking object-detection +1

Socially and Contextually Aware Human Motion and Pose Forecasting

no code implementations14 Jul 2020 Vida Adeli, Ehsan Adeli, Ian Reid, Juan Carlos Niebles, Hamid Rezatofighi

In this paper, we propose a novel framework to tackle both tasks of human motion (or trajectory) and body skeleton pose forecasting in a unified end-to-end pipeline.

Decoder Human Dynamics +1

Attend And Discriminate: Beyond the State-of-the-Art for Human Activity Recognition using Wearable Sensors

no code implementations14 Jul 2020 Alireza Abedin, Mahsa Ehsanpour, Qinfeng Shi, Hamid Rezatofighi, Damith C. Ranasinghe

Wearables are fundamental to improving our understanding of human activities, especially for an increasing number of healthcare applications from rehabilitation to fine-grained gait analysis.

Human Activity Recognition

Joint Learning of Social Groups, Individuals Action and Sub-group Activities in Videos

no code implementations ECCV 2020 Mahsa Ehsanpour, Alireza Abedin, Fatemeh Saleh, Javen Shi, Ian Reid, Hamid Rezatofighi

In this paper, we solve the problem of simultaneously grouping people by their social interactions, predicting their individual actions and the social activity of each social group, which we call the social task.

Group Activity Recognition

MOT20: A benchmark for multi object tracking in crowded scenes

1 code implementation19 Mar 2020 Patrick Dendorfer, Hamid Rezatofighi, Anton Milan, Javen Shi, Daniel Cremers, Ian Reid, Stefan Roth, Konrad Schindler, Laura Leal-Taixé

The benchmark for Multiple Object Tracking, MOTChallenge, was launched with the goal to establish a standardized evaluation of multiple object tracking methods.

Multi-Object Tracking Multiple Object Tracking with Transformer +2

JRMOT: A Real-Time 3D Multi-Object Tracker and a New Large-Scale Dataset

1 code implementation19 Feb 2020 Abhijeet Shenoi, Mihir Patel, JunYoung Gwak, Patrick Goebel, Amir Sadeghian, Hamid Rezatofighi, Roberto Martín-Martín, Silvio Savarese

In this work we present JRMOT, a novel 3D MOT system that integrates information from RGB images and 3D point clouds to achieve real-time, state-of-the-art tracking performance.

Autonomous Navigation Motion Planning +2

Learn to Predict Sets Using Feed-Forward Neural Networks

no code implementations30 Jan 2020 Hamid Rezatofighi, Tianyu Zhu, Roman Kaskman, Farbod T. Motlagh, Qinfeng Shi, Anton Milan, Daniel Cremers, Laura Leal-Taixé, Ian Reid

In our formulation we define a likelihood for a set distribution represented by a) two discrete distributions defining the set cardinally and permutation variables, and b) a joint distribution over set elements with a fixed cardinality.

Multi-Label Image Classification object-detection +1

Approximating the Permanent by Sampling from Adaptive Partitions

1 code implementation NeurIPS 2019 Jonathan Kuck, Tri Dao, Hamid Rezatofighi, Ashish Sabharwal, Stefano Ermon

Computing the permanent of a non-negative matrix is a core problem with practical applications ranging from target tracking to statistical thermodynamics.

Generalized Intersection over Union: A Metric and A Loss for Bounding Box Regression

10 code implementations CVPR 2019 Hamid Rezatofighi, Nathan Tsoi, JunYoung Gwak, Amir Sadeghian, Ian Reid, Silvio Savarese

By incorporating this generalized $IoU$ ($GIoU$) as a loss into the state-of-the art object detection frameworks, we show a consistent improvement on their performance using both the standard, $IoU$ based, and new, $GIoU$ based, performance measures on popular object detection benchmarks such as PASCAL VOC and MS COCO.

Object object-detection +2

Learning Pairwise Relationship for Multi-object Detection in Crowded Scenes

no code implementations12 Jan 2019 Yu Liu, Lingqiao Liu, Hamid Rezatofighi, Thanh-Toan Do, Qinfeng Shi, Ian Reid

As the post-processing step for object detection, non-maximum suppression (GreedyNMS) is widely used in most of the detectors for many years.

object-detection Object Detection

TrackerBots: Software in the Loop Study of Quad-Copter Robots for Locating Radio-tags in a 3D Space

1 code implementation1 Dec 2018 Hoa Van Nguyen, Hamid Rezatofighi, David Taggart, Bertram Ostendorf, Damith C. Ranasinghe

We investigate the problem of tracking and planning for a UAV in a task to locate multiple radio-tagged wildlife in a three-dimensional (3D) setting in the context of our TrackerBots research project.

Management TAG

Cannot find the paper you are looking for? You can Submit a new open access paper.