no code implementations • 4 Dec 2024 • Fawad Javed Fateh, Umer Ahmed, Hamza Khan, M. Zeeshan Zia, Quoc-Huy Tran
This paper introduces TemporalVLM, a video large language model capable of effective temporal reasoning and fine-grained understanding in long videos.
no code implementations • 12 Sep 2023 • Syed Waleed Hyder, Muhammad Usama, Anas Zafar, Muhammad Naufil, Fawad Javed Fateh, Andrey Konin, M. Zeeshan Zia, Quoc-Huy Tran
This paper presents a 2D skeleton-based action segmentation method with applications in fine-grained human activity recognition.
no code implementations • 31 May 2023 • Quoc-Huy Tran, Muhammad Ahmed, Murad Popattia, M. Hassan Ahmed, Andrey Konin, M. Zeeshan Zia
This paper presents a self-supervised temporal video alignment framework which is useful for several fine-grained human activity understanding applications.
1 code implementation • 31 May 2023 • Quoc-Huy Tran, Ahmed Mehmood, Muhammad Ahmed, Muhammad Naufil, Anas Zafar, Andrey Konin, M. Zeeshan Zia
The frame-level prediction module is trained in an unsupervised manner via temporal optimal transport.
Ranked #3 on
Unsupervised Action Segmentation
on Breakfast
no code implementations • 30 Jun 2022 • Hamza Khan, Sanjay Haresh, Awais Ahmed, Shakeeb Siddiqui, Andrey Konin, M. Zeeshan Zia, Quoc-Huy Tran
We introduce a novel approach for temporal activity segmentation with timestamp supervision.
1 code implementation • CVPR 2022 • Sateesh Kumar, Sanjay Haresh, Awais Ahmed, Andrey Konin, M. Zeeshan Zia, Quoc-Huy Tran
The temporal optimal transport module enables our approach to learn effective representations for unsupervised activity segmentation.
Ranked #1 on
Unsupervised Action Segmentation
on 50 Salads
no code implementations • 11 Apr 2020 • Sanjay Haresh, Sateesh Kumar, M. Zeeshan Zia, Quoc-Huy Tran
We apply: (i) one-class classification loss and (ii) reconstruction-based loss, for anomaly detection on RetroTrucks as well as on existing static-camera datasets.
no code implementations • ECCV 2018 • Mohammed E. Fathy, Quoc-Huy Tran, M. Zeeshan Zia, Paul Vernaza, Manmohan Chandraker
Further, we propose to use activation maps at different layers of a CNN, as an effective and principled replacement for the multi-resolution image pyramids often used for matching tasks.
no code implementations • 8 Jan 2018 • Chi Li, M. Zeeshan Zia, Quoc-Huy Tran, Xiang Yu, Gregory D. Hager, Manmohan Chandraker
In this work, we explore an approach for injecting prior domain structure into neural network training by supervising hidden layers of a CNN with intermediate concepts that normally are not observed in practice.
no code implementations • CVPR 2017 • Chi Li, M. Zeeshan Zia, Quoc-Huy Tran, Xiang Yu, Gregory D. Hager, Manmohan Chandraker
Monocular 3D object parsing is highly desirable in various scenarios including occlusion reasoning and holistic scene interpretation.
no code implementations • 15 Sep 2015 • M. Zeeshan Zia, Luigi Nardi, Andrew Jack, Emanuele Vespa, Bruno Bodin, Paul H. J. Kelly, Andrew J. Davison
SLAM has matured significantly over the past few years, and is beginning to appear in serious commercial products.
no code implementations • 18 Nov 2014 • M. Zeeshan Zia, Michael Stark, Konrad Schindler
An object class - in our case cars - is modeled as a deformable 3D wireframe, which enables fine-grained modeling at the level of individual vertices and faces.
3 code implementations • 8 Oct 2014 • Luigi Nardi, Bruno Bodin, M. Zeeshan Zia, John Mawer, Andy Nisbet, Paul H. J. Kelly, Andrew J. Davison, Mikel Luján, Michael F. P. O'Boyle, Graham Riley, Nigel Topham, Steve Furber
Real-time dense computer vision and SLAM offer great potential for a new level of scene modelling, tracking and real environmental interaction for many types of robot, but their high computational requirements mean that use on mass market embedded platforms is challenging.
no code implementations • CVPR 2013 • M. Zeeshan Zia, Michael Stark, Konrad Schindler
In this paper, we tackle the challenge of modeling occlusion in the context of a 3D geometric object class model that is capable of fine-grained, part-level 3D object reconstruction.