Search Results for author: Suman Saha

Found 22 papers, 10 papers with code

Online Real-time Multiple Spatiotemporal Action Localisation and Prediction

4 code implementations ICCV 2017 Gurkirt Singh, Suman Saha, Michael Sapienza, Philip Torr, Fabio Cuzzolin

To the best of our knowledge, ours is the first real-time (up to 40fps) system able to perform online S/T action localisation and early action prediction on the untrimmed videos of UCF101-24.

Early Action Prediction

Three Ways to Improve Semantic Segmentation with Self-Supervised Depth Estimation

1 code implementation CVPR 2021 Lukas Hoyer, Dengxin Dai, Yuhua Chen, Adrian Köring, Suman Saha, Luc van Gool

Training deep networks for semantic segmentation requires large amounts of labeled training data, which presents a major challenge in practice, as labeling segmentation masks is a highly labor-intensive process.

Data Augmentation Monocular Depth Estimation +2

ROAD: The ROad event Awareness Dataset for Autonomous Driving

2 code implementations23 Feb 2021 Gurkirt Singh, Stephen Akrigg, Manuele Di Maio, Valentina Fontana, Reza Javanmard Alitappeh, Suman Saha, Kossar Jeddisaravi, Farzad Yousefi, Jacob Culley, Tom Nicholson, Jordan Omokeowa, Salman Khan, Stanislao Grazioso, Andrew Bradley, Giuseppe Di Gironimo, Fabio Cuzzolin

We also report the performance on the ROAD tasks of Slowfast and YOLOv5 detectors, as well as that of the winners of the ICCV2021 ROAD challenge, which highlight the challenges faced by situation awareness in autonomous driving.

Action Detection Activity Detection +4

Learning to Relate Depth and Semantics for Unsupervised Domain Adaptation

1 code implementation CVPR 2021 Suman Saha, Anton Obukhov, Danda Pani Paudel, Menelaos Kanakis, Yuhua Chen, Stamatios Georgoulis, Luc van Gool

Specifically, we show that: (1) our approach improves performance on all tasks when they are complementary and mutually dependent; (2) the CTRL helps to improve both semantic segmentation and depth estimation tasks performance in the challenging UDA setting; (3) the proposed ISL training scheme further improves the semantic segmentation performance.

Monocular Depth Estimation Multi-Task Learning +4

Reparameterizing Convolutions for Incremental Multi-Task Learning without Task Interference

1 code implementation ECCV 2020 Menelaos Kanakis, David Bruggemann, Suman Saha, Stamatios Georgoulis, Anton Obukhov, Luc van Gool

First, enabling the model to be inherently incremental, continuously incorporating information from new tasks without forgetting the previously learned ones (incremental learning).

Incremental Learning Multi-Task Learning

EDAPS: Enhanced Domain-Adaptive Panoptic Segmentation

1 code implementation ICCV 2023 Suman Saha, Lukas Hoyer, Anton Obukhov, Dengxin Dai, Luc van Gool

EDAPS significantly improves the state-of-the-art performance for panoptic segmentation UDA by a large margin of 20% on SYNTHIA-to-Cityscapes and even 72% on the more challenging SYNTHIA-to-Mapillary Vistas.

Domain Adaptation Instance Segmentation +2

Exploiting Instance-based Mixed Sampling via Auxiliary Source Domain Supervision for Domain-adaptive Action Detection

1 code implementation28 Sep 2022 Yifan Lu, Gurkirt Singh, Suman Saha, Luc van Gool

We propose a novel domain adaptive action detection approach and a new adaptation protocol that leverages the recent advancements in image-level unsupervised domain adaptation (UDA) techniques and handle vagaries of instance-level video data.

Action Detection Pseudo Label +2

Incremental Tube Construction for Human Action Detection

1 code implementation5 Apr 2017 Harkirat Singh Behl, Michael Sapienza, Gurkirt Singh, Suman Saha, Fabio Cuzzolin, Philip H. S. Torr

In this work, we introduce a real-time and online joint-labelling and association algorithm for action detection that can incrementally construct space-time action tubes on the most challenging action videos in which different action categories occur concurrently.

Action Detection

TraMNet - Transition Matrix Network for Efficient Action Tube Proposals

1 code implementation1 Aug 2018 Gurkirt Singh, Suman Saha, Fabio Cuzzolin

At training time, transitions are specific to cell locations of the feature maps, so that a sparse (efficient) transition matrix is used to train the network.

Two-Stream AMTnet for Action Detection

1 code implementation3 Apr 2020 Suman Saha, Gurkirt Singh, Fabio Cuzzolin

This is achieved by augmenting the previous Action Micro-Tube (AMTnet) action detection framework in three distinct ways: by adding a parallel motion stIn this paper, we propose a new deep neural network architecture for online action detection, termed ream to the original appearance one in AMTnet; (2) in opposition to state-of-the-art action detectors which train appearance and motion streams separately, and use a test time late fusion scheme to fuse RGB and flow cues, by jointly training both streams in an end-to-end fashion and merging RGB and optical flow features at training time; (3) by introducing an online action tube generation algorithm which works at video-level, and in real-time (when exploiting only appearance features).

Autonomous Driving Online Action Detection +2

Unsupervised Deep Representations for Learning Audience Facial Behaviors

no code implementations10 May 2018 Suman Saha, Rajitha Navarathna, Leonhard Helminger, Romann Weber

In this paper, we present an unsupervised learning approach for analyzing facial behavior based on a deep generative model combined with a convolutional neural network (CNN).

Generative Adversarial Network

Spatio-temporal Human Action Localisation and Instance Segmentation in Temporally Untrimmed Videos

no code implementations22 Jul 2017 Suman Saha, Gurkirt Singh, Michael Sapienza, Philip H. S. Torr, Fabio Cuzzolin

Current state-of-the-art human action recognition is focused on the classification of temporally trimmed videos in which only one action occurs per frame.

Action Recognition Instance Segmentation +2

AMTnet: Action-Micro-Tube Regression by End-to-end Trainable Deep Architecture

no code implementations ICCV 2017 Suman Saha, Gurkirt Singh, Fabio Cuzzolin

As such, our 3D-RPN net is able to effectively encode the temporal aspect of actions by purely exploiting appearance, as opposed to methods which heavily rely on expensive flow maps.

Action Detection Region Proposal +1

Deep Learning for Detecting Multiple Space-Time Action Tubes in Videos

no code implementations4 Aug 2016 Suman Saha, Gurkirt Singh, Michael Sapienza, Philip H. S. Torr, Fabio Cuzzolin

In stage 2, the appearance network detections are boosted by combining them with the motion detection scores, in proportion to their respective spatial overlap.

Action Detection Motion Detection +1

Action Detection from a Robot-Car Perspective

no code implementations30 Jul 2018 Valentina Fontana, Gurkirt Singh, Stephen Akrigg, Manuele Di Maio, Suman Saha, Fabio Cuzzolin

We present the new Road Event and Activity Detection (READ) dataset, designed and created from an autonomous vehicle perspective to take action detection challenges to autonomous driving.

Action Detection Activity Detection +3

Predicting Action Tubes

no code implementations23 Aug 2018 Gurkirt Singh, Suman Saha, Fabio Cuzzolin

In this work, we present a method to predict an entire `action tube' (a set of temporally linked bounding boxes) in a trimmed video just by observing a smaller subset of it.

Action Classification Action Detection +1

Domain Agnostic Feature Learning for Image and Video Based Face Anti-spoofing

no code implementations15 Dec 2019 Suman Saha, Wen-Hao Xu, Menelaos Kanakis, Stamatios Georgoulis, Yu-Hua Chen, Danda Pani Paudel, Luc van Gool

Face anti-spoofing is a measure towards this direction for bio-metric user authentication, and in particular face recognition, that tries to prevent spoof attacks.

Face Anti-Spoofing Face Recognition

Spatio-Temporal Action Detection Under Large Motion

no code implementations6 Sep 2022 Gurkirt Singh, Vasileios Choutas, Suman Saha, Fisher Yu, Luc van Gool

Current methods for spatiotemporal action tube detection often extend a bounding box proposal at a given keyframe into a 3D temporal cuboid and pool features from nearby frames.

Action Detection

CS-Mixer: A Cross-Scale Vision MLP Model with Spatial-Channel Mixing

no code implementations25 Aug 2023 Jonathan Cui, David A. Araujo, Suman Saha, Md. Faisal Kabir

Despite their simpler information fusion designs compared with Vision Transformers and Convolutional Neural Networks, Vision MLP architectures have demonstrated strong performance and high data efficiency in recent research.

Three Ways to Improve Verbo-visual Fusion for Dense 3D Visual Grounding

no code implementations8 Sep 2023 Ozan Unal, Christos Sakaridis, Suman Saha, Fisher Yu, Luc van Gool

A common formulation to tackle 3D visual grounding is grounding-by-detection, where localization is done via bounding boxes.

3D Instance Segmentation Object +3

Language-Guided Instance-Aware Domain-Adaptive Panoptic Segmentation

no code implementations4 Apr 2024 Elham Amin Mansour, Ozan Unal, Suman Saha, Benjamin Bejar, Luc van Gool

A key challenge in panoptic UDA is reducing the domain gap between a labeled source and an unlabeled target domain while harmonizing the subtasks of semantic and instance segmentation to limit catastrophic interference.

Autonomous Driving Instance Segmentation +3

Cannot find the paper you are looking for? You can Submit a new open access paper.