no code implementations • 2014 22nd International Conference on Pattern Recognition 2014 • Georgios Evangelidis, Gurkirt Singh, Radu Horaud
Further, the use of a Fisher kernel representation is suggested to describe the skeletal quads contained in a (sub)action.
Ranked #119 on Skeleton Based Action Recognition on NTU RGB+D
1 code implementation • 7 Jul 2016 • Gurkirt Singh, Fabio Cuzzolin
We propose a simple, yet effective, method for the temporal detection of activities in temporally untrimmed videos with the help of untrimmed classification.
no code implementations • 4 Aug 2016 • Suman Saha, Gurkirt Singh, Michael Sapienza, Philip H. S. Torr, Fabio Cuzzolin
In stage 2, the appearance network detections are boosted by combining them with the motion detection scores, in proportion to their respective spatial overlap.
4 code implementations • ICCV 2017 • Gurkirt Singh, Suman Saha, Michael Sapienza, Philip Torr, Fabio Cuzzolin
To the best of our knowledge, ours is the first real-time (up to 40fps) system able to perform online S/T action localisation and early action prediction on the untrimmed videos of UCF101-24.
1 code implementation • 5 Apr 2017 • Harkirat Singh Behl, Michael Sapienza, Gurkirt Singh, Suman Saha, Fabio Cuzzolin, Philip H. S. Torr
In this work, we introduce a real-time and online joint-labelling and association algorithm for action detection that can incrementally construct space-time action tubes on the most challenging action videos in which different action categories occur concurrently.
no code implementations • ICCV 2017 • Suman Saha, Gurkirt Singh, Fabio Cuzzolin
As such, our 3D-RPN net is able to effectively encode the temporal aspect of actions by purely exploiting appearance, as opposed to methods which heavily rely on expensive flow maps.
no code implementations • 22 Jul 2017 • Suman Saha, Gurkirt Singh, Michael Sapienza, Philip H. S. Torr, Fabio Cuzzolin
Current state-of-the-art human action recognition is focused on the classification of temporally trimmed videos in which only one action occurs per frame.
no code implementations • 30 Jul 2018 • Valentina Fontana, Gurkirt Singh, Stephen Akrigg, Manuele Di Maio, Suman Saha, Fabio Cuzzolin
We present the new Road Event and Activity Detection (READ) dataset, designed and created from an autonomous vehicle perspective to take action detection challenges to autonomous driving.
1 code implementation • 1 Aug 2018 • Gurkirt Singh, Suman Saha, Fabio Cuzzolin
At training time, transitions are specific to cell locations of the feature maps, so that a sparse (efficient) transition matrix is used to train the network.
no code implementations • 23 Aug 2018 • Gurkirt Singh, Suman Saha, Fabio Cuzzolin
In this work, we present a method to predict an entire `action tube' (a set of temporally linked bounding boxes) in a trimmed video just by observing a smaller subset of it.
no code implementations • 17 Nov 2018 • Gurkirt Singh, Fabio Cuzzolin
Recently, three dimensional (3D) convolutional neural networks (CNNs) have emerged as dominant methods to capture spatiotemporal representations in videos, by adding to pre-existing 2D CNNs a third, temporal dimension.
no code implementations • 4 Apr 2019 • Silvio Olivastri, Gurkirt Singh, Fabio Cuzzolin
The decoder is then optimised on such static features to generate the video's description.
1 code implementation • 3 Apr 2020 • Suman Saha, Gurkirt Singh, Fabio Cuzzolin
This is achieved by augmenting the previous Action Micro-Tube (AMTnet) action detection framework in three distinct ways: by adding a parallel motion stIn this paper, we propose a new deep neural network architecture for online action detection, termed ream to the original appearance one in AMTnet; (2) in opposition to state-of-the-art action detectors which train appearance and motion streams separately, and use a test time late fusion scheme to fuse RGB and flow cues, by jointly training both streams in an end-to-end fashion and merging RGB and optical flow features at training time; (3) by introducing an online action tube generation algorithm which works at video-level, and in real-time (when exploiting only appearance features).
no code implementations • 12 Jun 2020 • Vivek Singh Bawa, Gurkirt Singh, Francis KapingA, InnaSkarga-Bandurova, Alice Leporini, Carmela Landolfo, Armando Stabile, Francesco Setti, Riccardo Muradore, Elettra Oleari, Fabio Cuzzolin
In this work, we take aim towards increasing the effectiveness of surgical assistant robots.
1 code implementation • 31 Aug 2020 • Gurkirt Singh
In this thesis, we focus on video action understanding problems from an online and real-time processing point of view.
2 code implementations • 23 Feb 2021 • Gurkirt Singh, Stephen Akrigg, Manuele Di Maio, Valentina Fontana, Reza Javanmard Alitappeh, Suman Saha, Kossar Jeddisaravi, Farzad Yousefi, Jacob Culley, Tom Nicholson, Jordan Omokeowa, Salman Khan, Stanislao Grazioso, Andrew Bradley, Giuseppe Di Gironimo, Fabio Cuzzolin
We also report the performance on the ROAD tasks of Slowfast and YOLOv5 detectors, as well as that of the winners of the ICCV2021 ROAD challenge, which highlight the challenges faced by situation awareness in autonomous driving.
no code implementations • 7 Apr 2021 • Vivek Singh Bawa, Gurkirt Singh, Francis KapingA, Inna Skarga-Bandurova, Elettra Oleari, Alice Leporini, Carmela Landolfo, Pengfei Zhao, Xi Xiang, Gongning Luo, Kuanquan Wang, Liangzhi Li, Bowen Wang, Shang Zhao, Li Li, Armando Stabile, Francesco Setti, Riccardo Muradore, Fabio Cuzzolin
For an autonomous robotic system, monitoring surgeon actions and assisting the main surgeon during a procedure can be very challenging.
no code implementations • 6 Sep 2022 • Gurkirt Singh, Vasileios Choutas, Suman Saha, Fisher Yu, Luc van Gool
Current methods for spatiotemporal action tube detection often extend a bounding box proposal at a given keyframe into a 3D temporal cuboid and pool features from nearby frames.
1 code implementation • 28 Sep 2022 • Yifan Lu, Gurkirt Singh, Suman Saha, Luc van Gool
We propose a novel domain adaptive action detection approach and a new adaptation protocol that leverages the recent advancements in image-level unsupervised domain adaptation (UDA) techniques and handle vagaries of instance-level video data.