no code implementations • ECCV 2020 • Jin Xie, Hisham Cholakkal, Rao Muhammad Anwer, Fahad Shahbaz Khan, Yanwei Pang, Ling Shao, Mubarak Shah
We further introduce a count-and-similarity branch within the two-stage detection framework, which predicts pedestrian count as well as proposal similarity.
1 code implementation • ECCV 2020 • Shruti Vyas, Yogesh S Rawat, Mubarak Shah
We evaluate the effectiveness of the learned representation for multi-view video action recognition in a supervised approach.
1 code implementation • CVPR 2023 • Aisha Urooj Khan, Hilde Kuehne, Bo Wu, Kim Chheu, Walid Bousselham, Chuang Gan, Niels Lobo, Mubarak Shah
The proposed method is trained in an end-to-end manner and optimized by a VQA loss with the cross-entropy function and a Hungarian matching loss for the situation graph prediction.
1 code implementation • CVPR 2023 • Syed Talal Wasim, Muzammal Naseer, Salman Khan, Fahad Shahbaz Khan, Mubarak Shah
Through this prompting scheme, we can achieve state-of-the-art zero-shot performance on Kinetics-600, HMDB51 and UCF101 while remaining competitive in the supervised setting.
no code implementations • 6 Apr 2023 • Sijie Zhu, Linjie Yang, Chen Chen, Mubarak Shah, Xiaohui Shen, Heng Wang
Visual Place Recognition (VPR) estimates the location of query images by matching them with images in a reference database.
1 code implementation • 3 Apr 2023 • Omkar Thawakar, Sanath Narayan, Hisham Cholakkal, Rao Muhammad Anwer, Salman Khan, Jorma Laaksonen, Mubarak Shah, Fahad Shahbaz Khan
Open-world formulation relaxes the close-world static-learning assumption as follows: (a) first, it distinguishes a set of known categories as well as labels an unknown object as `unknown' and then (b) it incrementally learns the class of an unknown as and when the corresponding semantic labels become available.
no code implementations • 31 Mar 2023 • Daochang Liu, Qiyue Li, AnhDung Dinh, Tingting Jiang, Mubarak Shah, Chang Xu
Temporal action segmentation is crucial for understanding long-form videos.
Ranked #1 on
Action Segmentation
on Breakfast
no code implementations • CVPR 2023 • Ishan Rajendrakumar Dave, Mamshad Nayeem Rizve, Chen Chen, Mubarak Shah
We observe that these representations complement each other depending on the nature of the action.
1 code implementation • 21 Mar 2023 • Omkar Thawakar, Rao Muhammad Anwer, Jorma Laaksonen, Orly Reiner, Mubarak Shah, Fahad Shahbaz Khan
Accurate 3D mitochondria instance segmentation in electron microscopy (EM) is a challenging problem and serves as a prerequisite to empirically analyze their distributions and morphology.
no code implementations • CVPR 2023 • Brandon Clark, Alec Kerrigan, Parth Parag Kulkarni, Vicente Vivanco Cepeda, Mubarak Shah
To this end, we introduce an end-to-end transformer-based architecture that exploits the relationship between different geographic levels (which we refer to as hierarchies) and the corresponding visual scene information in an image through hierarchical cross-attention.
no code implementations • CVPR 2023 • Mamshad Nayeem Rizve, Gaurav Mittal, Ye Yu, Matthew Hall, Sandra Sajeev, Mubarak Shah, Mei Chen
To address this, we present PivoTAL, Prior-driven Supervision for Weakly-supervised Temporal Action Localization, to approach WTAL from a localization-by-localization perspective by learning to localize the action snippets directly.
Weakly Supervised Action Localization
Weakly Supervised Temporal Action Localization
no code implementations • CVPR 2023 • Rohit Gupta, Anirban Roy, Claire Christensen, Sujeong Kim, Sarah Gerard, Madeline Cincebeaux, Ajay Divakaran, Todd Grindal, Mubarak Shah
We learn a class prototype for each class and a loss function is employed to minimize the distances between a class prototype and the samples from the class.
1 code implementation • CVPR 2023 • Sijie Zhu, Linjie Yang, Chen Chen, Mubarak Shah, Xiaohui Shen, Heng Wang
Visual Place Recognition (VPR) estimates the location of query images by matching them with images in a reference database.
no code implementations • 24 Dec 2022 • Saeed Vahidian, Sreevatsank Kadaveru, Woonjoon Baek, Weijia Wang, Vyacheslav Kungurtsev, Chen Chen, Mubarak Shah, Bill Lin
Specifically, we aim to investigate how ordered learning principles can contribute to alleviating the heterogeneity effects in FL.
no code implementations • 28 Nov 2022 • Nicolae-Catalin Ristea, Florinel-Alin Croitoru, Dana Dascalescu, Radu Tudor Ionescu, Fahad Shahbaz Khan, Mubarak Shah
We propose a very fast frame-level model for anomaly detection in video, which learns to detect anomalies by distilling knowledge from multiple highly accurate object-level teacher models.
no code implementations • 23 Nov 2022 • Rohit Gupta, Naveed Akhtar, Gaurav Kumar Nayak, Ajmal Mian, Mubarak Shah
By using a nearly disjoint dataset to train the substitute model, our method removes the requirement that the substitute model be trained using the same dataset as the target model, and leverages queries to the target model to retain the fooling rate benefits provided by query-based methods.
1 code implementation • CVPR 2023 • Ankan Kumar Bhunia, Salman Khan, Hisham Cholakkal, Rao Muhammad Anwer, Jorma Laaksonen, Mubarak Shah, Fahad Shahbaz Khan
In this work, we show how denoising diffusion models can be applied for high-fidelity person image synthesis with strong sample diversity and enhanced mode coverage of the learnt data distribution.
no code implementations • 1 Nov 2022 • Jyoti Kini, Ajmal Mian, Mubarak Shah
We propose a method for joint detection and tracking of multiple objects in 3D point clouds, a task conventionally treated as a two-step process comprising object detection followed by data association.
no code implementations • 23 Oct 2022 • Guo-Jun Qi, Mubarak Shah
In this paper, we review adversarial pretraining of self-supervised deep networks including both convolutional neural networks and vision transformers.
1 code implementation • 16 Oct 2022 • Tushar Sangam, Ishan Rajendrakumar Dave, Waqas Sultani, Mubarak Shah
Drone-to-drone detection using visual feed has crucial applications like avoiding collision with other drones/airborne objects, tackling a drone attack or coordinating flight with other drones.
1 code implementation • 30 Sep 2022 • Mahdi Morafah, Saeed Vahidian, Chen Chen, Mubarak Shah, Bill Lin
Though successful, federated learning presents new challenges for machine learning, especially when the issue of data heterogeneity, also known as Non-IID data, arises.
1 code implementation • 25 Sep 2022 • Neelu Madan, Nicolae-Catalin Ristea, Radu Tudor Ionescu, Kamal Nasrollahi, Fahad Shahbaz Khan, Thomas B. Moeslund, Mubarak Shah
In this work, we extend our previous self-supervised predictive convolutional attentive block (SSPCAB) with a 3D masked convolutional layer, as well as a transformer for channel-wise attention.
Ranked #4 on
Anomaly Detection
on CUHK Avenue
1 code implementation • 21 Sep 2022 • Saeed Vahidian, Mahdi Morafah, Weijia Wang, Vyacheslav Kungurtsev, Chen Chen, Mubarak Shah, Bill Lin
This small set of principal vectors is provided to the server so that the server can directly identify distribution similarities among the clients to form clusters.
1 code implementation • 10 Sep 2022 • Florinel-Alin Croitoru, Vlad Hondru, Radu Tudor Ionescu, Mubarak Shah
Denoising diffusion models represent a recent emerging topic in computer vision, demonstrating remarkable results in the area of generative modeling.
no code implementations • 22 Jul 2022 • Rohit Gupta, Naveed Akhtar, Ajmal Mian, Mubarak Shah
We establish that this is a result of the presence of false negative pairs in the training process, which increases model sensitivity to input perturbations.
no code implementations • 16 Jul 2022 • Antonio Barbalau, Radu Tudor Ionescu, Mariana-Iuliana Georgescu, Jacob Dueholm, Bharathkumar Ramachandra, Kamal Nasrollahi, Fahad Shahbaz Khan, Thomas B. Moeslund, Mubarak Shah
A self-supervised multi-task learning (SSMTL) framework for video anomaly detection was recently introduced in literature.
Ranked #2 on
Anomaly Detection
on CUHK Avenue
1 code implementation • 6 Jul 2022 • Shruti Vyas, Chen Chen, Mubarak Shah
There are no existing datasets for this problem, therefore we propose GAMa dataset, a large-scale dataset with ground videos and corresponding aerial images.
1 code implementation • 5 Jul 2022 • Mamshad Nayeem Rizve, Navid Kardan, Mubarak Shah
We also highlight the flexibility of our approach in solving novel class discovery task, demonstrate its stability in dealing with imbalanced data, and complement our approach with a technique to estimate the number of novel classes
Ranked #1 on
Open-World Semi-Supervised Learning
on ImageNet-100
(Novel accuracy (10% Labeled) metric)
Novel Class Discovery
Open-World Semi-Supervised Learning
+1
1 code implementation • 5 Jul 2022 • Aisha Urooj Khan, Hilde Kuehne, Chuang Gan, Niels da Vitoria Lobo, Mubarak Shah
Transformers for visual-language representation learning have been getting a lot of interest and shown tremendous performance on visual question answering (VQA) and grounding.
1 code implementation • 5 Jul 2022 • Mamshad Nayeem Rizve, Navid Kardan, Salman Khan, Fahad Shahbaz Khan, Mubarak Shah
In the open-world SSL problem, the objective is to recognize samples of known classes, and simultaneously detect and cluster samples belonging to novel classes present in unlabeled data.
Ranked #1 on
Open-World Semi-Supervised Learning
on CIFAR-10
no code implementations • 18 Jun 2022 • Madeline C. Schiappa, Yogesh S. Rawat, Mubarak Shah
In this survey, we provide a review of existing approaches on self-supervised learning focusing on the video domain.
no code implementations • 6 Jun 2022 • Fabio De Sousa Ribeiro, Kevin Duarte, Miles Everett, Georgios Leontidis, Mubarak Shah
The aim of this survey is to provide a comprehensive overview of the capsule network research landscape, which will serve as a valuable resource for the community going forward.
1 code implementation • 24 May 2022 • Mitch Hill, Jonathan Mitchell, Chu Chen, Yuan Du, Mubarak Shah, Song-Chun Zhu
This work presents strategies to learn an Energy-Based Model (EBM) according to the desired length of its MCMC sampling trajectories.
no code implementations • 22 Apr 2022 • Jyoti Kini, Mubarak Shah
Video Instance Segmentation is a fundamental computer vision task that deals with segmenting and tracking object instances across a video sequence.
no code implementations • 22 Apr 2022 • Jyoti Kini, Fahad Shahbaz Khan, Salman Khan, Mubarak Shah
We propose a novel self-supervised Video Object Segmentation (VOS) approach that strives to achieve better object-background discriminability for accurate object segmentation.
no code implementations • 17 Apr 2022 • Rajat Modi, Aayush Jung Rana, Akash Kumar, Praveen Tirupattur, Shruti Vyas, Yogesh Singh Rawat, Mubarak Shah
Beyond possessing large enough size to feed data hungry machines (eg, transformers), what attributes measure the quality of a dataset?
1 code implementation • CVPR 2022 • Jiale Cao, Yanwei Pang, Rao Muhammad Anwer, Hisham Cholakkal, Jin Xie, Mubarak Shah, Fahad Shahbaz Khan
We propose a novel one-step transformer-based person search framework, PSTR, that jointly performs person detection and re-identification (re-id) in a single architecture.
1 code implementation • CVPR 2022 • Sijie Zhu, Mubarak Shah, Chen Chen
It does not rely on polar transform and infers faster than CNN-based methods.
1 code implementation • CVPR 2022 • Ishan Rajendrakumar Dave, Chen Chen, Mubarak Shah
Existing approaches for mitigating privacy leakage in action recognition require privacy labels along with the action labels from the video dataset.
Ranked #1 on
Action Classification
on UCF101
1 code implementation • CVPR 2022 • Nazmul Karim, Mamshad Nayeem Rizve, Nazanin Rahnavard, Ajmal Mian, Mubarak Shah
To combat label noise, recent state-of-the-art methods employ some sort of sample selection mechanism to select a possibly clean subset of data.
2 code implementations • 3 Dec 2021 • Huan Lei, Naveed Akhtar, Mubarak Shah, Ajmal Mian
In this paper, we propose a series of modular operations for effective geometric feature learning from 3D triangle meshes.
2 code implementations • CVPR 2022 • Akshita Gupta, Sanath Narayan, K J Joseph, Salman Khan, Fahad Shahbaz Khan, Mubarak Shah
In the case of incremental object detection, OW-DETR outperforms the state-of-the-art for all settings on PASCAL VOC.
no code implementations • 1 Dec 2021 • Kevin Duarte, Brian Chen, Nina Shvetsova, Andrew Rouditchenko, Samuel Thomas, Alexander Liu, David Harwath, James Glass, Hilde Kuehne, Mubarak Shah
We present a new multimodal capsule network that allows us to leverage the strength of capsules in the context of a multimodal learning framework on large amounts of video data.
no code implementations • NeurIPS 2021 • Alec Kerrigan, Kevin Duarte, Yogesh Rawat, Mubarak Shah
Given a video and a set of action classes, our method predicts a set of confidence scores for each class independently.
4 code implementations • CVPR 2022 • Nicolae-Catalin Ristea, Neelu Madan, Radu Tudor Ionescu, Kamal Nasrollahi, Fahad Shahbaz Khan, Thomas B. Moeslund, Mubarak Shah
Our block is equipped with a loss that minimizes the reconstruction error with respect to the masked area in the receptive field.
Ranked #1 on
Anomaly Detection
on CUHK Avenue
(TBDC metric)
1 code implementation • CVPR 2022 • Andra Acsintoae, Andrei Florescu, Mariana-Iuliana Georgescu, Tudor Mare, Paul Sumedrea, Radu Tudor Ionescu, Fahad Shahbaz Khan, Mubarak Shah
This is a closed-set scenario that fails to test the capability of systems at detecting new anomaly types.
Ranked #5 on
Anomaly Detection
on CUHK Avenue
(using extra training data)
no code implementations • 14 Oct 2021 • Ishan Dave, Naman Biyani, Brandon Clark, Rohit Gupta, Yogesh Rawat, Mubarak Shah
This technical report presents our approach "Knights" to solve the action recognition task on a small subset of Kinetics-400 i. e. Kinetics400ViPriors without using any extra-data.
no code implementations • ICLR 2022 • Navid Kardan, Mubarak Shah, Mitch Hill
A supervised learning problem is often formulated using an i. i. d.
1 code implementation • ICCV 2021 • Sanath Narayan, Akshita Gupta, Salman Khan, Fahad Shahbaz Khan, Ling Shao, Mubarak Shah
We note that the best existing multi-label ZSL method takes a shared approach towards attending to region features with a common set of attention maps for all the classes.
Ranked #2 on
Multi-label zero-shot learning
on Open Images V4
no code implementations • 1 Aug 2021 • Naveed Akhtar, Ajmal Mian, Navid Kardan, Mubarak Shah
In [2], we reviewed the contributions made by the computer vision community in adversarial attacks on deep learning (and their defenses) until the advent of year 2018.
no code implementations • 29 Jul 2021 • Amir Mazaheri, Mubarak Shah
To the best of our knowledge, this is the very first work on the text (free-form sentences) to video generation on more realistic video datasets like Actor and Action Dataset (A2D) or UCF101.
1 code implementation • 24 Jul 2021 • Praveen Tirupattur, Aayush J Rana, Tushar Sangam, Shruti Vyas, Yogesh S Rawat, Mubarak Shah
While various approaches have been shown effective for recognition task in recent works, they often do not deal with videos of lower resolution where the action is happening in a tiny region.
1 code implementation • 19 Jul 2021 • Dawei Du, Longyin Wen, Pengfei Zhu, Heng Fan, QinGhua Hu, Haibin Ling, Mubarak Shah, Junwen Pan, Ali Al-Ali, Amr Mohamed, Bakour Imene, Bin Dong, Binyu Zhang, Bouchali Hadia Nesma, Chenfeng Xu, Chenzhen Duan, Ciro Castiello, Corrado Mencar, Dingkang Liang, Florian Krüger, Gennaro Vessio, Giovanna Castellano, Jieru Wang, Junyu Gao, Khalid Abualsaud, Laihui Ding, Lei Zhao, Marco Cianciotta, Muhammad Saqib, Noor Almaadeed, Omar Elharrouss, Pei Lyu, Qi Wang, Shidong Liu, Shuang Qiu, Siyang Pan, Somaya Al-Maadeed, Sultan Daud Khan, Tamer Khattab, Tao Han, Thomas Golda, Wei Xu, Xiang Bai, Xiaoqing Xu, Xuelong Li, Yanyun Zhao, Ye Tian, Yingnan Lin, Yongchao Xu, Yuehan Yao, Zhenyu Xu, Zhijian Zhao, Zhipeng Luo, Zhiwei Wei, Zhiyuan Zhao
Crowd counting on the drone platform is an interesting topic in computer vision, which brings new challenges such as small object inference, background clutter and wide viewpoint.
no code implementations • 7 Jul 2021 • Nayyer Aafaq, Naveed Akhtar, Wei Liu, Mubarak Shah, Ajmal Mian
In contrast, we propose a GAN-based algorithm for crafting adversarial examples for neural image captioning that mimics the internal representation of the CNN such that the resulting deep features of the input image enable a controlled incorrect caption generation through the recurrent network.
no code implementations • 23 Jun 2021 • Crystal Gagne, Jyoti Kini, Daniel Smith, Mubarak Shah
Trail camera imagery has increasingly gained popularity amongst biologists for conservation and ecological research.
2 code implementations • CVPR 2021 • Alireza Zaeemzadeh, Niccolo Bisagno, Zeno Sambugaro, Nicola Conci, Nazanin Rahnavard, Mubarak Shah
In this paper, we argue that OOD samples can be detected more easily if the training data is embedded into a low-dimensional space, such that the embedded training samples lie on a union of 1-dimensional subspaces.
no code implementations • 7 Jun 2021 • Sarah Shiraz, Krishna Regmi, Shruti Vyas, Yogesh S. Rawat, Mubarak Shah
We address the problem of novel view video prediction; given a set of input video clips from a single/multiple views, our network is able to predict the video from a novel view.
no code implementations • 3 Jun 2021 • Aakash Kumar, Jyoti Kini, Mubarak Shah, Ajmal Mian
In recent times, the scope of LIDAR (Light Detection and Ranging) sensor-based technology has spread across numerous fields.
no code implementations • 22 May 2021 • Kevin Duarte, Yogesh S. Rawat, Mubarak Shah
By stochastically masking labels during loss computation, the method balances this ratio for each class, leading to improved recall on minority classes and improved precision on frequent classes.
1 code implementation • 14 May 2021 • Taojiannan Yang, Sijie Zhu, Matias Mendieta, Pu Wang, Ravikumar Balakrishnan, Minwoo Lee, Tao Han, Mubarak Shah, Chen Chen
MutualNet is a general training methodology that can be applied to various network structures (e. g., 2D networks: MobileNets, ResNet, 3D networks: SlowFast, X3D) and various tasks (e. g., image classification, object detection, segmentation, and action recognition), and is demonstrated to achieve consistent improvements on a variety of datasets.
1 code implementation • CVPR 2021 • Aisha Urooj Khan, Hilde Kuehne, Kevin Duarte, Chuang Gan, Niels Lobo, Mubarak Shah
In this paper, we focus on a more relaxed setting: the grounding of relevant visual entities in a weakly supervised manner by training on the VQA task alone.
no code implementations • 30 Apr 2021 • Sirnam Swetha, Hilde Kuehne, Yogesh S Rawat, Mubarak Shah
This paper proposes a novel approach for unsupervised sub-action learning in complex activities.
Ranked #24 on
Action Segmentation
on Breakfast
1 code implementation • ICCV 2021 • Ankan Kumar Bhunia, Salman Khan, Hisham Cholakkal, Rao Muhammad Anwer, Fahad Shahbaz Khan, Mubarak Shah
We propose a novel transformer-based styled handwritten text image generation approach, HWT, that strives to learn both style-content entanglement as well as global and local writing style patterns.
1 code implementation • CVPR 2021 • Muhammad Waseem Ashraf, Waqas Sultani, Mubarak Shah
The erratic movement of the source and target drones, small size, arbitrary shape, large intensity variations, and occlusion make this problem quite challenging.
no code implementations • 19 Mar 2021 • Ashkan Esmaeili, Marzieh Edraki, Nazanin Rahnavard, Mubarak Shah, Ajmal Mian
It is set forth that the proposed sparse perturbation is the most aligned sparse perturbation with the shortest path from the input sample to the decision boundary for some initial adversarial sample (the best sparse approximation of shortest path, likely to fool the model).
1 code implementation • CVPR 2021 • Praveen Tirupattur, Kevin Duarte, Yogesh Rawat, Mubarak Shah
We propose to improve action localization performance by modeling these action dependencies in a novel attention-based Multi-Label Action Dependency (MLAD)layer.
Ranked #1 on
Action Detection
on Multi-THUMOS
1 code implementation • CVPR 2021 • Mamshad Nayeem Rizve, Salman Khan, Fahad Shahbaz Khan, Mubarak Shah
Equivariance or invariance has been employed standalone in the previous works; however, to the best of our knowledge, they have not been used jointly.
1 code implementation • 20 Jan 2021 • Ishan Dave, Rohit Gupta, Mamshad Nayeem Rizve, Mubarak Shah
However, prior work on contrastive learning for video data has not explored the effect of explicitly encouraging the features to be distinct across the temporal dimension.
Ranked #9 on
Self-supervised Video Retrieval
on UCF101
1 code implementation • ICLR 2021 • Mamshad Nayeem Rizve, Kevin Duarte, Yogesh S Rawat, Mubarak Shah
The recent research in semi-supervised learning (SSL) is mostly dominated by consistency regularization based methods which achieve strong performance.
no code implementations • 4 Jan 2021 • Salman Khan, Muzammal Naseer, Munawar Hayat, Syed Waqas Zamir, Fahad Shahbaz Khan, Mubarak Shah
Astounding results from Transformer models on natural language tasks have intrigued the vision community to study their application to computer vision problems.
1 code implementation • ICCV 2021 • Krishna Regmi, Mubarak Shah
In this paper, we address the problem of video geo-localization by proposing a Geo-Temporal Feature Learning (GTFL) Network to simultaneously learn the discriminative features between the query videos and gallery images for estimating the geo-spatial trajectory of a query video.
no code implementations • ICCV 2021 • Alireza Zaeemzadeh, Shabnam Ghadar, Baldo Faieta, Zhe Lin, Nazanin Rahnavard, Mubarak Shah, Ratheesh Kalarot
For example, a user can ask for retrieving images similar to a query image, but with a different hair color, and no preference for absence/presence of eyeglasses in the results.
no code implementations • 1 Jan 2021 • Saeed Vahidian, Mohsen Joneidi, Ashkan Esmaeili, Siavash Khodadadeh, Sharare Zehtabian, Ladislau Boloni, Nazanin Rahnavard, Bill Lin, Mubarak Shah
The approach is based on the concept of {\em self-rank}, defined as the minimum number of samples needed to reconstruct all samples with an accuracy proportional to the rank-$K$ approximation.
1 code implementation • 24 Dec 2020 • Ce Zheng, Wenhan Wu, Chen Chen, Taojiannan Yang, Sijie Zhu, Ju Shen, Nasser Kehtarnavaz, Mubarak Shah
Furthermore, 2D and 3D human pose estimation datasets and evaluation metrics are included.
1 code implementation • 25 Nov 2020 • Simone Palazzo, Concetto Spampinato, Joseph Schmidt, Isaak Kavasidis, Daniela Giordano, Mubarak Shah
We argue that the reason why Li et al. [1] observe such high correlation in EEG data is their unconventional experimental design and settings that violate the basic cognitive neuroscience design recommendations, first and foremost the one of limiting the experiments' duration, as instead done in [2].
1 code implementation • CVPR 2021 • Mariana-Iuliana Georgescu, Antonio Barbalau, Radu Tudor Ionescu, Fahad Shahbaz Khan, Marius Popescu, Mubarak Shah
To the best of our knowledge, we are the first to approach anomalous event detection in video as a multi-task learning problem, integrating multiple self-supervised and knowledge distillation proxy tasks in a single architecture.
Ranked #2 on
Anomaly Detection
on UCSD Peds2
Abnormal Event Detection In Video
Anomaly Detection In Surveillance Videos
+4
1 code implementation • Findings of the Association for Computational Linguistics 2020 • Aisha Urooj Khan, Amir Mazaheri, Niels da Vitoria Lobo, Mubarak Shah
We present MMFT-BERT(MultiModal Fusion Transformer with BERT encodings), to solve Visual Question Answering (VQA) ensuring individual and combined processing of multiple input modalities.
no code implementations • 19 Oct 2020 • Jathushan Rajasegaran, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Mubarak Shah
This demonstrates their ability to acquire transferable knowledge, a capability that is central to human learning.
1 code implementation • 30 Sep 2020 • Viresh Ranjan, Boyu Wang, Mubarak Shah, Minh Hoai
We present sample selection strategies which make use of the density and uncertainty of predictions from the networks trained on one domain to select the informative images from a target domain of interest to acquire human annotation.
2 code implementations • 27 Aug 2020 • Mariana-Iuliana Georgescu, Radu Tudor Ionescu, Fahad Shahbaz Khan, Marius Popescu, Mubarak Shah
Following the standard formulation of abnormal event detection as outlier detection, we propose a background-agnostic framework that learns from training videos containing only normal events.
Abnormal Event Detection In Video
Anomaly Detection In Surveillance Videos
+2
3 code implementations • ECCV 2020 • Shi-Jie Sun, Naveed Akhtar, Xiang-Yu Song, HuanSheng Song, Ajmal Mian, Mubarak Shah
Deep learning-based Multiple Object Tracking (MOT) currently relies on off-the-shelf detectors for tracking-by-detection. This results in deep models that are detector biased and evaluations that are detector influenced.
no code implementations • 3 Aug 2020 • Aaron Ott, Amir Mazaheri, Niels D. Lobo, Mubarak Shah
In the photo enhancer, we employ super-resolution to increase the number of pixels in the embedded image and reduce the effect of stretching and distortion of pixels.
no code implementations • 28 Jul 2020 • Xiao-Yu Zhang, Ajmal Mian, Rohit Gupta, Nazanin Rahnavard, Mubarak Shah
We also propose an anomaly detection method to identify the target class in a Trojaned network.
Ranked #1 on
Adversarial Defense
on TrojAI Round 1
1 code implementation • 16 Jul 2020 • Marzieh Edraki, Nazmul Karim, Nazanin Rahnavard, Ajmal Mian, Mubarak Shah
We propose a detector that is based on the analysis of the intrinsic DNN properties; that are affected due to the Trojaning process.
1 code implementation • 14 Jul 2020 • Ugur Demir, Yogesh S Rawat, Mubarak Shah
In real-world surveillance environments, the actions in videos are captured at a wide range of resolutions.
1 code implementation • 17 Jun 2020 • Jathushan Rajasegaran, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Mubarak Shah
Our experiments show that, even in the first stage, self-supervision can outperform current state-of-the-art methods, with further gains achieved by our second stage distillation process.
Ranked #12 on
Few-Shot Image Classification
on FC100 5-way (5-shot)
no code implementations • 8 May 2020 • Aidean Sharghi, Niels da Vitoria Lobo, Mubarak Shah
We Input a set of video shots and the network generates a text description for each shot.
no code implementations • 23 Apr 2020 • Mamshad Nayeem Rizve, Ugur Demir, Praveen Tirupattur, Aayush Jung Rana, Kevin Duarte, Ishan Dave, Yogesh Singh Rawat, Mubarak Shah
For tubelet extraction, we propose a localization network which takes a video clip as input and spatio-temporally detects potential foreground regions at multiple scales to generate action tubelets.
no code implementations • 15 Apr 2020 • Rohit Gupta, Mubarak Shah
Accurate and fine-grained information about the extent of damage to buildings is essential for directing Humanitarian Aid and Disaster Response (HADR) operations in the immediate aftermath of any natural calamity.
1 code implementation • 1 Apr 2020 • Erik Quintanilla, Yogesh Rawat, Andrey Sakryukin, Mubarak Shah, Mohan Kankanhalli
We demonstrate the effectiveness of the proposed model on two different large-scale and publicly available datasets, YFCC100M and NUS-WIDE.
1 code implementation • CVPR 2020 • Jathushan Rajasegaran, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Mubarak Shah
In this paper, we hypothesize this problem can be avoided by learning a set of generalized parameters, that are neither specific to old nor new tasks.
1 code implementation • 7 Feb 2020 • Marzieh Edraki, Nazanin Rahnavard, Mubarak Shah
In this paper, we propose the SubSpace Capsule Network (SCN) that exploits the idea of capsule networks to model possible variations in the appearance or implicitly defined properties of an entity through a group of capsule subspaces instead of simply grouping neurons to create capsules.
no code implementations • 23 Nov 2019 • Mahdi M. Kalayeh, Mubarak Shah
In SSG, the same idea is applied to the intermediate layers of the network.
no code implementations • 22 Oct 2019 • Waqas Sultani, Mubarak Shah
However, using deep neural networks for automatic aerial action recognition is difficult due to the need for a large number of training aerial human action videos.
1 code implementation • ICCV 2019 • Kevin Duarte, Yogesh S Rawat, Mubarak Shah
In this work we propose a capsule-based approach for semi-supervised video object segmentation.
One-shot visual object segmentation
Optical Flow Estimation
+3
no code implementations • 21 Jul 2019 • Rui Hou, Chen Chen, Rahul Sukthankar, Mubarak Shah
Convolutional Neural Network (CNN) based image segmentation has made great progress in recent years.
Ranked #62 on
Semi-Supervised Video Object Segmentation
on DAVIS 2016
1 code implementation • ICCV 2019 • Leulseged Tesfaye Alemu, Marcello Pelillo, Mubarak Shah
By optimizing the constrained clustering in an end-to-end manner, we naturally leverage the contextual knowledge of a set of images corresponding to the given person-images.
Ranked #2 on
Person Re-Identification
on CUHK03
(Rank-5 metric)
1 code implementation • ICCV 2019 • Krishna Regmi, Mubarak Shah
Our Feature Fusion method combines the complementary features from a synthesized aerial image with the corresponding ground features to obtain a robust query representation.
no code implementations • 4 Apr 2019 • Viresh Ranjan, Mubarak Shah, Minh Hoai Nguyen
Most of the existing crowd counting approaches rely on local features for estimating the crowd density map.
no code implementations • 22 Dec 2018 • Emrah Basaran, Yonatan Tariku Tesfaye, Mubarak Shah
In recent years, we have seen the performance of video-based person Re-Identification (ReID) methods have improved considerably.
no code implementations • 2 Dec 2018 • Bruce McIntosh, Kevin Duarte, Yogesh S Rawat, Mubarak Shah
The existing works on actor-action localization are mainly focused on localization in a single frame instead of the full video.
2 code implementations • CVPR 2019 • Mohsen Joneidi, Alireza Zaeemzadeh, Nazanin Rahnavard, Mubarak Shah
In our algorithm, at each iteration, the maximum information from the structure of the data is captured by one selected sample, and the captured information is neglected in the next iterations by projection on the null-space of previously selected samples.
no code implementations • NeurIPS 2019 • Siavash Khodadadeh, Ladislau Bölöni, Mubarak Shah
In this paper, we propose UMTRA, an algorithm that performs unsupervised, model-agnostic meta-learning for classification tasks.
no code implementations • 26 Nov 2018 • Shruti Vyas, Yogesh S Rawat, Mubarak Shah
We demonstrate the effectiveness of the proposed method in rendering view-aware as well as time-aware video clips on two different real-world datasets including UCF-101 and NTU-RGB+D.
2 code implementations • 28 Oct 2018 • Shi-Jie Sun, Naveed Akhtar, HuanSheng Song, Ajmal Mian, Mubarak Shah
In this paper, we harness the power of deep learning for data association in tracking by jointly modelling object appearances and their affinities between different frames in an end-to-end fashion.
no code implementations • 25 Oct 2018 • Simone Palazzo, Concetto Spampinato, Isaak Kavasidis, Daniela Giordano, Joseph Schmidt, Mubarak Shah
After verifying that visual information can be extracted from EEG data, we introduce a multimodal approach that uses deep image and EEG encoders, trained in a siamese configuration, for learning a joint manifold that maximizes a compatibility measure between visual features and brain representations.
no code implementations • 26 Sep 2018 • Pooya Abolghasemi, Amir Mazaheri, Mubarak Shah, Ladislau Bölöni
In this paper, we propose an approach for augmenting a deep visuomotor policy trained through demonstrations with Task Focused visual Attention (TFA).
no code implementations • ECCV 2018 • Haroon Idrees, Muhmmad Tayyab, Kishan Athrey, Dong Zhang, Somaya Al-Maadeed, Nasir Rajpoot, Mubarak Shah
With multiple crowd gatherings of millions of people every year in events ranging from pilgrimages to protests, concerts to marathons, and festivals to funerals; visual crowd analysis is emerging as a new frontier in computer vision.
Ranked #7 on
Crowd Counting
on UCF-QNRF
no code implementations • 7 Jun 2018 • Mahdi M. Kalayeh, Mubarak Shah
We show that assuming samples within a mini-batch are from the same probability density function, then BN is identical to the Fisher vector of a Gaussian distribution.
no code implementations • 1 Jun 2018 • Nayyer Aafaq, Ajmal Mian, Wei Liu, Syed Zulqarnain Gilani, Mubarak Shah
Video description is the automatic generation of natural language sentences that describe the contents of a given video.
no code implementations • NeurIPS 2018 • Kevin Duarte, Yogesh S Rawat, Mubarak Shah
In this work, we present a more elegant solution for action detection based on the recently developed capsule network.
no code implementations • 20 May 2018 • Muhammad Abdullah Jamal, Guo-Jun Qi, Mubarak Shah
Meta-learning approaches have been proposed to tackle the few-shot learning problem. Typically, a meta-learner is trained on a variety of tasks in the hopes of being generalizable to new tasks.
1 code implementation • 18 May 2018 • Alireza Zaeemzadeh, Nazanin Rahnavard, Mubarak Shah
We prove that the skip connections in the residual blocks facilitate preserving the norm of the gradient, and lead to stable back-propagation, which is desirable from optimization perspective.
no code implementations • CVPR 2018 • Mahdi M. Kalayeh, Emrah Basaran, Muhittin Gokmen, Mustafa E. Kamasak, Mubarak Shah
In this paper, we propose to adopt human semantic parsing which, due to its pixel-level accuracy and capability of modeling arbitrary contours, is naturally a better alternative.
Ranked #73 on
Person Re-Identification
on Market-1501
7 code implementations • CVPR 2018 • Waqas Sultani, Chen Chen, Mubarak Shah
To avoid annotating the anomalous segments or clips in training videos, which is very time consuming, we propose to learn anomaly through the deep multiple instance ranking framework by leveraging weakly labeled training videos, i. e. the training labels (anomalous or normal) are at video-level instead of clip-level.
Ranked #2 on
Abnormal Event Detection In Video
on UBI-Fights
Activity Recognition
Anomaly Detection In Surveillance Videos
+2
1 code implementation • ECCV 2018 • Amir Mazaheri, Mubarak Shah
A semantic inconsistency between the sentence and the video or between the words of a sentence can result in an inaccurate description.
no code implementations • 30 Nov 2017 • Rui Hou, Chen Chen, Mubarak Shah
A video is first divided into equal length clips and next for each clip a set of tube proposals are generated based on 3D CNN features.
no code implementations • ICCV 2017 • Simone Palazzo, Concetto Spampinato, Isaak Kavasidis, Daniela Giordano, Mubarak Shah
In this work, we build on the latter class of approaches and investigate the possibility of driving and conditioning the image generation process by means of brain signals recorded, through an electroencephalograph (EEG), while users look at images from a set of 40 ImageNet object categories with the objective of generating the seen images.
no code implementations • ICCV 2017 • Khurram Soomro, Mubarak Shah
Once classes are discovered, training videos within each cluster are selected to perform automatic spatio-temporal annotations, by first oversegmenting videos in each discovered class into supervoxels and constructing a directed graph to apply a variant of knapsack problem with temporal constraints.
no code implementations • ICCV 2017 • Nasim Souly, Concetto Spampinato, Mubarak Shah
Semantic segmentation has been a long standing challenging task in computer vision.
no code implementations • 19 Jun 2017 • Yonatan Tariku Tesfaye, Eyasu Zemene, Andrea Prati, Marcello Pelillo, Mubarak Shah
In this paper, a unified three-layer hierarchical approach for solving tracking problems in multiple non-overlapping cameras is proposed.
no code implementations • CVPR 2017 • Mahdi M. Kalayeh, Boqing Gong, Mubarak Shah
We build our facial attribute prediction model jointly with a deep semantic segmentation network.
1 code implementation • ICCV 2017 • Amir Mazaheri, Dong Zhang, Mubarak Shah
Since the source sentence is broken into two fragments: the sentence's left fragment (before the blank) and the sentence's right fragment (after the blank), traditional Recurrent Neural Networks cannot encode this structure accurately because of many possible variations of the missing word in terms of the location and type of the word in the source sentence.
no code implementations • CVPR 2018 • Rodney LaLonde, Dong Zhang, Mubarak Shah
To reduce the large search space, the first stage (ClusterNet) takes in a set of extremely large video frames, combines the motion and appearance information within the convolutional architecture, and proposes regions of objects of interest (ROOBI).
no code implementations • 3 Apr 2017 • Waqas Sultani, Dong Zhang, Mubarak Shah
Given the action proposals in a video, the goal of the proposed work is to generate a few better action proposals that are ranked properly.
1 code implementation • ICCV 2017 • Rui Hou, Chen Chen, Mubarak Shah
A video is first divided into equal length clips and for each clip a set of tube proposals are generated next based on 3D Convolutional Network (ConvNet) features.
Ranked #3 on
Action Detection
on UCF101-24
no code implementations • 28 Mar 2017 • Nasim Souly, Concetto Spampinato, Mubarak Shah
Semantic segmentation has been a long standing challenging task in computer vision.
1 code implementation • CVPR 2017 • Yicong Tian, Chen Chen, Mubarak Shah
Next, for each building in the query image, we retrieve the $k$ nearest neighbors from the reference buildings using a Siamese network trained on both positive matching image pairs and negative pairs.
Cross-View Image-to-Image Translation
Image Classification
+2
no code implementations • 4 Feb 2017 • Eyasu Zemene, Yonatan Tariku, Haroon Idrees, Andrea Prati, Marcello Pelillo, Mubarak Shah
We cast the geo-localization as a clustering problem on local image features.
no code implementations • 7 Dec 2016 • Shayan Modiri Assari, Haroon Idrees, Mubarak Shah
This paper addresses the problem of human re-identification across non-overlapping cameras in crowds. Re-identification in crowded scenes is a challenging problem due to large number of people and frequent occlusions, coupled with changes in their appearance due to different properties and exposure of cameras.
no code implementations • 4 Dec 2016 • Khurram Soomro, Haroon Idrees, Mubarak Shah
For online prediction of action (interaction) confidences, we propose an approach based on Structural SVM that operates on short video segments, and is trained with the objective that confidence of an action or interaction increases as time progresses.
no code implementations • 14 Oct 2016 • Yicong Tian, Mubarak Shah
For segmentation, multi-label Conditional Random Field (CRF) is applied to a superpixel based spatio-temporal graph in a segment of video to assign background or target labels to every superpixel.
no code implementations • 13 Oct 2016 • Amir Mazaheri, Dong Zhang, Mubarak Shah
In the experiments, we have demonstrated the superior performance of the proposed method on the challenging "Movie Fill-in-the-Blank" dataset.
2 code implementations • CVPR 2017 • Concetto Spampinato, Simone Palazzo, Isaak Kavasidis, Daniela Giordano, Mubarak Shah, Nasim Souly
In particular, we employ EEG data evoked by visual object stimuli combined with Recurrent Neural Networks (RNN) to learn a discriminative brain activity manifold of visual categories.
no code implementations • 17 Aug 2016 • Nasim Souly, Mubarak Shah
In this paper, we propose to use high-level knowledge regarding rules in the inference to incorporate dependencies among regions in the image to improve scores of classification.
no code implementations • 18 Jul 2016 • Aidean Sharghi, Boqing Gong, Mubarak Shah
The decision to include a shot in the summary depends on the shot's relevance to the user query and importance in the context of the video, jointly.
no code implementations • 16 Jun 2016 • Subhabrata Bhattacharya, Nasim Souly, Mubarak Shah
Using an over-complete dictionary of the covariance based descriptors built from labeled training samples, we formulate low-level event recognition as a sparse linear approximation problem.
no code implementations • CVPR 2016 • Khurram Soomro, Haroon Idrees, Mubarak Shah
This paper proposes a novel approach to tackle the challenging problem of 'online action localization' which entails predicting actions and their locations as they happen in a video.
no code implementations • CVPR 2016 • Waqas Sultani, Mubarak Shah
%We reconstruct video action proposals from image action proposals while enforcing consistency across coefficient vectors of multiple frames by consensus regularization.
Optical Flow Estimation
Spatio-Temporal Action Localization
+1
no code implementations • CVPR 2016 • Nasim Souly, Mubarak Shah
To do this, we formulate the problem as an energy minimization over a graph, whose structure is captured by applying sparse constraint on the elements of the precision matrix.
no code implementations • CVPR 2016 • Yang Zhang, Boqing Gong, Mubarak Shah
The well-known word analogy experiments show that the recent word vectors capture fine-grained linguistic regularities in words by linear vector offsets, but it is unclear how well the simple vector offsets can encode visual regularities over words.
Ranked #5 on
Multi-label zero-shot learning
on Open Images V4
no code implementations • 26 May 2016 • Waqas Sultani, Mubarak Shah
The output of our method is the most action representative proposals from each video.
no code implementations • 26 Apr 2016 • Dong Zhang, Mubarak Shah
A sequence of the best poses is inferred from the abstract body part tracklets through the tree-based optimization.
no code implementations • 21 Apr 2016 • Haroon Idrees, Amir R. Zamir, Yu-Gang Jiang, Alex Gorban, Ivan Laptev, Rahul Sukthankar, Mubarak Shah
Additionally, we include a comprehensive empirical study evaluating the differences in action recognition between trimmed and untrimmed videos, and how well methods trained on trimmed videos generalize to untrimmed videos.
no code implementations • 30 Mar 2016 • Afshin Dehghan, Mubarak Shah
In this paper, we propose a tracker that addresses the aforementioned problems and is capable of tracking hundreds of people efficiently.
no code implementations • 25 Feb 2016 • Thomas Castelli, Aidean Sharghi, Don Harper, Alain Tremeau, Mubarak Shah
In recent years, consumer Unmanned Aerial Vehicles have become very popular, everyone can buy and fly a drone without previous experience, which raises concern in regards to regulations and public safety.
no code implementations • 2 Feb 2016 • Hossein Rahmani, Ajmal Mian, Mubarak Shah
The strength of our technique is that we learn a single R-NKTM for all actions and all viewpoints for knowledge transfer of any real human action video without the need for re-training or fine-tuning the model.
no code implementations • ICCV 2015 • Dong Zhang, Mubarak Shah
Using the idea of `Association', the optimal tracklets are generated for each abstract body part, in order to enforce the spatiotemporal constraints between body parts in adjacent frames.
no code implementations • ICCV 2015 • Khurram Soomro, Haroon Idrees, Mubarak Shah
Context relations are learned during training which capture displacements from all the supervoxels in a video to those belonging to foreground actions.
no code implementations • CVPR 2015 • Afshin Dehghan, Yicong Tian, Philip H. S. Torr, Mubarak Shah
In this paper we show that multiple object tracking (MOT) can be formulated in a framework, where the detection and data-association are performed simultaneously.
1 code implementation • CVPR 2015 • Shervin Ardeshir, Kofi Malcolm Collins-Sibley, Mubarak Shah
In this paper, we propose a method which leverages information acquired from GIS databases to perform semantic segmentation of the image alongside with geo-referencing each semantic segment with its address and geo-location.
no code implementations • CVPR 2015 • Afshin Dehghan, Shayan Modiri Assari, Mubarak Shah
Data association is the backbone to many multiple object tracking (MOT) methods.
no code implementations • 4 Jan 2015 • Mahdi M. Kalayeh, Stephen Mussmann, Alla Petrakova, Niels da Vitoria Lobo, Mubarak Shah
In the second phase, via a Kmeans clustering approach, we create motion components by clustering the flow vectors with respect to their location and velocity.
no code implementations • CVPR 2014 • Afshin Dehghan, Haroon Idrees, Mubarak Shah
A video captures a sequence and interactions of concepts that can be static, for instance, objects or scenes, or dynamic, such as actions.
no code implementations • CVPR 2014 • Amir Roshan Zamir, Shervin Ardeshir, Mubarak Shah
We develop a robust method for identification and refinement of this subset using the rest of the images in the dataset.
no code implementations • CVPR 2014 • Mahdi M. Kalayeh, Haroon Idrees, Mubarak Shah
Such models become obsolete and require relearning when new images and tags are added to database.
no code implementations • CVPR 2014 • Afshin Dehghan, Enrique. G. Ortiz, Ruben Villegas, Mubarak Shah
Recent years have seen a major push for face recognition technology due to the large expansion of image sharing on social networks.
no code implementations • CVPR 2014 • Subhabrata Bhattacharya, Mahdi M. Kalayeh, Rahul Sukthankar, Mubarak Shah
While approaches based on bags of features excel at low-level action classification, they are ill-suited for recognizing complex events in video, where concept-based temporal representations currently dominate.
no code implementations • CVPR 2014 • Shayan Modiri Assari, Amir Roshan Zamir, Mubarak Shah
We address the problem of classifying complex videos based on their content.
no code implementations • 28 Sep 2013 • Dong Zhang, Omar Oreifej, Mubarak Shah
In contrast, we propose to extract cross-image features, i. e. features across the pair of images, which, as we demonstrate, is more discriminative to the similarity and the dissimilarity of faces.
no code implementations • CVPR 2013 • Guang Shu, Afshin Dehghan, Mubarak Shah
In general, our method takes detection bounding boxes of a generic detector as input and generates the detection output with higher average precision and precise object regions.
no code implementations • CVPR 2013 • Yicong Tian, Rahul Sukthankar, Mubarak Shah
Deformable part models have achieved impressive performance for object detection, even on difficult image datasets.
no code implementations • CVPR 2013 • Enrique. G. Ortiz, Alan Wright, Mubarak Shah
A straightforward application of the popular n-minimization for face recognition on a frame-by-frame basis is prohibitively expensive, so we propose a novel algorithm Mean Sequence SRC (MSSRC) that performs video face recognition using a joint optimization leveraging all of the available video data and the knowledge that the face track frames belong to the same individual.
no code implementations • CVPR 2013 • Haroon Idrees, Imran Saleemi, Cody Seibert, Mubarak Shah
Instead, our approach relies on multiple sources such as low confidence head detections, repetition of texture elements (using SIFT), and frequency-domain analysis to estimate counts, along with confidence associated with observing individuals, in an image region.
Ranked #14 on
Crowd Counting
on UCF-QNRF
no code implementations • CVPR 2013 • Dong Zhang, Omar Javed, Mubarak Shah
The proposed approach has several contributions: First, a novel layered Directed Acyclic Graph (DAG) based framework is presented for detection and segmentation of the primary object in video.
no code implementations • CVPR 2013 • Yang Yang, Guang Shu, Mubarak Shah
In order to learn discriminative and compact features, we propose a new feature learning method using a deep neural network based on auto encoders.
7 code implementations • 3 Dec 2012 • Khurram Soomro, Amir Roshan Zamir, Mubarak Shah
To the best of our knowledge, UCF101 is currently the most challenging dataset of actions due to its large number of classes, large number of clips and also unconstrained nature of such clips.
Ranked #5 on
Action Recognition In Videos
on UCF101
Action Recognition In Videos
Skeleton Based Action Recognition
+1