Search Results for author: Mubarak Shah

Found 213 papers, 94 papers with code

Count- and Similarity-aware R-CNN for Pedestrian Detection

no code implementations ECCV 2020 Jin Xie, Hisham Cholakkal, Rao Muhammad Anwer, Fahad Shahbaz Khan, Yanwei Pang, Ling Shao, Mubarak Shah

We further introduce a count-and-similarity branch within the two-stage detection framework, which predicts pedestrian count as well as proposal similarity.

Human Instance Segmentation Pedestrian Detection +1

Multi-view Action Recognition using Cross-view Video Prediction

1 code implementation ECCV 2020 Shruti Vyas, Yogesh S Rawat, Mubarak Shah

We evaluate the effectiveness of the learned representation for multi-view video action recognition in a supervised approach.

Action Recognition Representation Learning +2

Exploring Local Memorization in Diffusion Models via Bright Ending Attention

no code implementations29 Oct 2024 Chen Chen, Daochang Liu, Mubarak Shah, Chang Xu

Furthermore, driven by our observation that local memorization significantly underperforms in existing tasks of measuring, detecting, and mitigating memorization in diffusion models compared to global memorization, we propose a simple yet effective method to integrate BE and the results of the new localization task into these existing frameworks.

Memorization

Investigating Memorization in Video Diffusion Models

no code implementations29 Oct 2024 Chen Chen, Enhuai Liu, Daochang Liu, Mubarak Shah, Chang Xu

Diffusion models, widely used for image and video generation, face a significant limitation: the risk of memorizing and reproducing training data during inference, potentially generating unauthorized copyrighted content.

Memorization Video Generation

Robin3D: Improving 3D Large Language Model via Robust Instruction Tuning

1 code implementation30 Sep 2024 Weitai Kang, Haifeng Huang, Yuzhang Shang, Mubarak Shah, Yan Yan

RIG generates two key instruction data: 1) the Adversarial Instruction-following data, which features mixed negative and positive samples to enhance the model's discriminative understanding.

Instruction Following Language Modelling +1

Sync from the Sea: Retrieving Alignable Videos from Large-Scale Datasets

no code implementations2 Sep 2024 Ishan Rajendrakumar Dave, Fabian Caba Heilbron, Mubarak Shah, Simon Jenni

Temporal video alignment aims to synchronize the key events like object interactions or action phase transitions in two videos.

Video Alignment Video Editing +1

GAReT: Cross-view Video Geolocalization with Adapters and Auto-Regressive Transformers

1 code implementation5 Aug 2024 Manu S Pillai, Mamshad Nayeem Rizve, Mubarak Shah

We introduce GeoAdapter, a transformer-adapter module designed to efficiently aggregate image-level representations and adapt them for video inputs.

geo-localization

X-Former: Unifying Contrastive and Reconstruction Learning for MLLMs

no code implementations18 Jul 2024 Sirnam Swetha, Jinyu Yang, Tal Neiman, Mamshad Nayeem Rizve, Son Tran, Benjamin Yao, Trishul Chilimbi, Mubarak Shah

In this work, we focus on enhancing the visual representations for MLLMs by combining high-frequency and detailed visual representations, obtained through masked image modeling (MIM), with semantically-enriched low-frequency representations captured by CL.

Contrastive Learning Representation Learning +1

Open Vocabulary Multi-Label Video Classification

no code implementations12 Jul 2024 Rohit Gupta, Mamshad Nayeem Rizve, Jayakrishnan Unnikrishnan, Ashish Tawari, Son Tran, Mubarak Shah, Benjamin Yao, Trishul Chilimbi

Pre-trained vision-language models (VLMs) have enabled significant progress in open vocabulary computer vision tasks such as image classification, object detection and image segmentation.

Action Classification Classification +7

Regulating Model Reliance on Non-Robust Features by Smoothing Input Marginal Density

1 code implementation5 Jul 2024 Peiyu Yang, Naveed Akhtar, Mubarak Shah, Ajmal Mian

Trustworthy machine learning necessitates meticulous regulation of model reliance on non-robust features.

SegVG: Transferring Object Bounding Box to Segmentation for Visual Grounding

1 code implementation3 Jul 2024 Weitai Kang, Gaowen Liu, Mubarak Shah, Yan Yan

Specifically, we propose the Multi-layer Multi-task Encoder-Decoder as the target grounding stage, where we learn a regression query and multiple segmentation queries to ground the target by regression and segmentation of the box in each decoding layer, respectively.

object-detection Object Detection +3

Surgical Triplet Recognition via Diffusion Model

no code implementations19 Jun 2024 Daochang Liu, Axel Hu, Mubarak Shah, Chang Xu

In this paper, we propose DiffTriplet, a new generative framework for surgical triplet recognition employing the diffusion model, which predicts surgical triplets via iterative denoising.

Action Triplet Recognition Denoising +1

Xi-Net: Transformer Based Seismic Waveform Reconstructor

1 code implementation14 Jun 2024 Anshuman Gaharwar, Parth Parag Kulkarni, Joshua Dickey, Mubarak Shah

To the best of our knowledge, this is the first transformer-based deep learning model for seismic waveform reconstruction.

Decoder

Intent3D: 3D Object Detection in RGB-D Scans Based on Human Intention

no code implementations28 May 2024 Weitai Kang, Mengxue Qu, Jyoti Kini, Yunchao Wei, Mubarak Shah, Yan Yan

To achieve detection based on human intention, it relies on humans to observe the scene, reason out the target that aligns with their intention ("pillow" in this case), and finally provide a reference to the AI system, such as "A pillow on the couch".

3D Object Detection 3D visual grounding +2

PTQ4DiT: Post-training Quantization for Diffusion Transformers

1 code implementation25 May 2024 Junyi Wu, Haoxuan Wang, Yuzhang Shang, Mubarak Shah, Yan Yan

SSC extends this approach by dynamically adjusting the balanced salience to capture the temporal variations in activation.

Image Generation Quantization

Curriculum Direct Preference Optimization for Diffusion and Consistency Models

no code implementations22 May 2024 Florinel-Alin Croitoru, Vlad Hondru, Radu Tudor Ionescu, Nicu Sebe, Mubarak Shah

Our approach, Curriculum DPO, is compared against state-of-the-art fine-tuning approaches on three benchmarks, outperforming the competing methods in terms of text alignment, aesthetics and human preference.

Text-to-Image Generation

Sparse Points to Dense Clouds: Enhancing 3D Detection with Limited LiDAR Data

no code implementations10 Apr 2024 Aakash Kumar, Chen Chen, Ajmal Mian, Neils Lobo, Mubarak Shah

Our method requires only a small number of 3D points, that can be obtained from a low-cost, low-resolution sensor.

3D Object Detection Autonomous Driving +1

Single Stage Adaptive Multi-Attention Network for Image Restoration

1 code implementation IEEE Transactions on Image Processing 2024 Anas Zafar, Danyal Aftab, Rizwan Qureshi, Xinqi Fan, Pingjun Chen, Jia Wu, Hazrat Ali, Shah Nawaz, Sheheryar Khan, Mubarak Shah

In this paper, we propose a novel and computationally efficient architecture Single Stage Adaptive Multi-Attention Network (SSAMAN) for image restoration tasks, particularly for image denoising and image deblurring.

Deblurring Image Deblurring +2

Diffexplainer: Towards Cross-modal Global Explanations with Diffusion Models

no code implementations3 Apr 2024 Matteo Pennisi, Giovanni Bellitto, Simone Palazzo, Mubarak Shah, Concetto Spampinato

We present DiffExplainer, a novel framework that, leveraging language-vision models, enables multimodal global explainability.

Temporally Consistent Referring Video Object Segmentation with Hybrid Memory

1 code implementation28 Mar 2024 Bo Miao, Mohammed Bennamoun, Yongsheng Gao, Mubarak Shah, Ajmal Mian

Referring Video Object Segmentation (R-VOS) methods face challenges in maintaining consistent object segmentation due to temporal context variability and the presence of other visually similar objects.

HTR Object +6

Composed Video Retrieval via Enriched Context and Discriminative Embeddings

1 code implementation CVPR 2024 Omkar Thawakar, Muzammal Naseer, Rao Muhammad Anwer, Salman Khan, Michael Felsberg, Mubarak Shah, Fahad Shahbaz Khan

Composed video retrieval (CoVR) is a challenging problem in computer vision which has recently highlighted the integration of modification text with visual queries for more sophisticated video search in large databases.

Composed Video Retrieval (CoVR) Retrieval

VidLA: Video-Language Alignment at Scale

no code implementations CVPR 2024 Mamshad Nayeem Rizve, Fan Fei, Jayakrishnan Unnikrishnan, Son Tran, Benjamin Z. Yao, Belinda Zeng, Mubarak Shah, Trishul Chilimbi

To effectively address this limitation, we instead keep the network architecture simple and use a set of data tokens that operate at different temporal resolutions in a hierarchical manner, accounting for the temporally hierarchical nature of videos.

Language Modelling Visual Grounding

AdaIR: Adaptive All-in-One Image Restoration via Frequency Mining and Modulation

1 code implementation21 Mar 2024 Yuning Cui, Syed Waqas Zamir, Salman Khan, Alois Knoll, Mubarak Shah, Fahad Shahbaz Khan

Our approach is motivated by the observation that different degradation types impact the image content on different frequency subbands, thereby requiring different treatments for each restoration task.

Deblurring Denoising +3

FSViewFusion: Few-Shots View Generation of Novel Objects

no code implementations11 Mar 2024 Rukhshanda Hussain, Hui Xian Grace Lim, BorChun Chen, Mubarak Shah, Ser Nam Lim

Second, we establish that the concept of a view can be disentangled and transferred to a novel object irrespective of the original object's identify from which the views are learnt.

Novel View Synthesis

Möbius Transform for Mitigating Perspective Distortions in Representation Learning

no code implementations7 Mar 2024 Prakash Chandra Chhipa, Meenakshi Subhash Chippa, Kanjar De, Rajkumar Saini, Marcus Liwicki, Mubarak Shah

In this work, we propose mitigating perspective distortion (MPD) by employing a fine-grained parameter control on a specific family of M\"obius transform to model real-world distortion without estimating camera intrinsic and extrinsic parameters and without the need for actual distorted data.

Crowd Counting object-detection +3

CodaMal: Contrastive Domain Adaptation for Malaria Detection in Low-Cost Microscopes

1 code implementation16 Feb 2024 Ishan Rajendrakumar Dave, Tristan de Blegiers, Chen Chen, Mubarak Shah

On the publicly available large-scale M5-dataset, our proposed method shows a significant improvement of 16% over the state-of-the-art methods in terms of the mean average precision metric (mAP), provides 21x speed improvement during inference and requires only half of the learnable parameters used in prior methods.

Domain Adaptation object-detection +1

No More Shortcuts: Realizing the Potential of Temporal Self-Supervision

no code implementations20 Dec 2023 Ishan Rajendrakumar Dave, Simon Jenni, Mubarak Shah

To address these issues, we propose 1) a more challenging reformulation of temporal self-supervision as frame-level (rather than clip-level) recognition tasks and 2) an effective augmentation strategy to mitigate shortcuts.

Action Classification Attribute +7

DVANet: Disentangling View and Action Features for Multi-View Action Recognition

1 code implementation10 Dec 2023 Nyle Siddiqui, Praveen Tirupattur, Mubarak Shah

In this work, we present a novel approach to multi-view action recognition where we guide learned action representations to be separated from view-relevant information in a video.

Action Recognition In Videos Decoder

Multiview Aerial Visual Recognition (MAVREC): Can Multi-view Improve Aerial Visual Perception?

no code implementations CVPR 2024 Aritra Dutta, Srijan Das, Jacob Nielsen, Rajatsubhra Chakraborty, Mubarak Shah

Despite the commercial abundance of UAVs, aerial data acquisition remains challenging, and the existing Asia and North America-centric open-source UAV datasets are small-scale or low-resolution and lack diversity in scene contextuality.

Benchmarking Diversity +3

PG-Video-LLaVA: Pixel Grounding Large Video-Language Models

1 code implementation22 Nov 2023 Shehan Munasinghe, Rusiru Thushara, Muhammad Maaz, Hanoona Abdul Rasheed, Salman Khan, Mubarak Shah, Fahad Khan

Extending image-based Large Multimodal Models (LMMs) to videos is challenging due to the inherent complexity of video data.

Benchmarking Phrase Grounding +4

Egocentric RGB+Depth Action Recognition in Industry-Like Settings

1 code implementation25 Sep 2023 Jyoti Kini, Sarah Fleischer, Ishan Dave, Mubarak Shah

Our work focuses on recognizing actions from egocentric RGB and Depth modalities in an industry-like environment.

Action Recognition

Dual Student Networks for Data-Free Model Stealing

no code implementations18 Sep 2023 James Beetham, Navid Kardan, Ajmal Mian, Mubarak Shah

To this end, the two main challenges are estimating gradients of the target model without access to its parameters, and generating a diverse set of training samples that thoroughly explores the input space.

CDFSL-V: Cross-Domain Few-Shot Learning for Videos

1 code implementation ICCV 2023 Sarinda Samarasinghe, Mamshad Nayeem Rizve, Navid Kardan, Mubarak Shah

To address this issue, in this work, we propose a novel cross-domain few-shot video action recognition method that leverages self-supervised learning and curriculum learning to balance the information from the source and target domains.

cross-domain few-shot learning Few-Shot action recognition +3

EventTransAct: A video transformer-based framework for Event-camera based action recognition

no code implementations25 Aug 2023 Tristan de Blegiers, Ishan Rajendrakumar Dave, Adeel Yousaf, Mubarak Shah

Recognizing and comprehending human actions and gestures is a crucial perception requirement for robots to interact with humans and carry out tasks in diverse domains, including service robotics, healthcare, and manufacturing.

Action Recognition

Preserving Modality Structure Improves Multi-Modal Learning

1 code implementation ICCV 2023 Swetha Sirnam, Mamshad Nayeem Rizve, Nina Shvetsova, Hilde Kuehne, Mubarak Shah

Self-supervised learning on large-scale multi-modal datasets allows learning semantically meaningful embeddings in a joint multi-modal representation space without relying on human annotations.

Retrieval Self-Supervised Learning

TeD-SPAD: Temporal Distinctiveness for Self-supervised Privacy-preservation for video Anomaly Detection

1 code implementation ICCV 2023 Joseph Fioresi, Ishan Rajendrakumar Dave, Mubarak Shah

In this paper, we propose TeD-SPAD, a privacy-aware video anomaly detection framework that destroys visual private information in a self-supervised manner.

Anomaly Detection Attribute +4

Ensemble Modeling for Multimodal Visual Action Recognition

1 code implementation10 Aug 2023 Jyoti Kini, Sarah Fleischer, Ishan Dave, Mubarak Shah

In this work, we propose an ensemble modeling approach for multimodal action recognition.

Action Recognition

Reverse Stable Diffusion: What prompt was used to generate this image?

1 code implementation2 Aug 2023 Florinel-Alin Croitoru, Vlad Hondru, Radu Tudor Ionescu, Mubarak Shah

To this end, we study the task of predicting the prompt embedding given an image generated by a generative diffusion model.

Text-to-Image Generation

Foundational Models Defining a New Era in Vision: A Survey and Outlook

1 code implementation25 Jul 2023 Muhammad Awais, Muzammal Naseer, Salman Khan, Rao Muhammad Anwer, Hisham Cholakkal, Mubarak Shah, Ming-Hsuan Yang, Fahad Shahbaz Khan

Vision systems to see and reason about the compositional nature of visual scenes are fundamental to understanding our world.

Benchmarking

Frequency Domain Adversarial Training for Robust Volumetric Medical Segmentation

2 code implementations14 Jul 2023 Asif Hanif, Muzammal Naseer, Salman Khan, Mubarak Shah, Fahad Shahbaz Khan

While recent advances in deep learning have improved the performance of volumetric medical image segmentation models, these models cannot be deployed for real-world applications immediately due to their vulnerability to adversarial attacks.

Adversarial Attack Deep Learning +4

Exploiting the Brain's Network Structure for Automatic Identification of ADHD Subjects

no code implementations15 Jun 2023 Soumyabrata Dey, Ravishankar Rao, Mubarak Shah

The concatenation of the network features of all the voxels in a brain serves as the feature vector.

Learning Situation Hyper-Graphs for Video Question Answering

1 code implementation CVPR 2023 Aisha Urooj Khan, Hilde Kuehne, Bo Wu, Kim Chheu, Walid Bousselham, Chuang Gan, Niels Lobo, Mubarak Shah

The proposed method is trained in an end-to-end manner and optimized by a VQA loss with the cross-entropy function and a Hungarian matching loss for the situation graph prediction.

Decoder Question Answering +2

$R^{2}$Former: Unified $R$etrieval and $R$eranking Transformer for Place Recognition

no code implementations6 Apr 2023 Sijie Zhu, Linjie Yang, Chen Chen, Mubarak Shah, Xiaohui Shen, Heng Wang

Visual Place Recognition (VPR) estimates the location of query images by matching them with images in a reference database.

Feature Correlation Retrieval +1

Vita-CLIP: Video and text adaptive CLIP via Multimodal Prompting

1 code implementation CVPR 2023 Syed Talal Wasim, Muzammal Naseer, Salman Khan, Fahad Shahbaz Khan, Mubarak Shah

Through this prompting scheme, we can achieve state-of-the-art zero-shot performance on Kinetics-600, HMDB51 and UCF101 while remaining competitive in the supervised setting.

Action Recognition Video Classification +2

Video Instance Segmentation in an Open-World

1 code implementation3 Apr 2023 Omkar Thawakar, Sanath Narayan, Hisham Cholakkal, Rao Muhammad Anwer, Salman Khan, Jorma Laaksonen, Mubarak Shah, Fahad Shahbaz Khan

Open-world formulation relaxes the close-world static-learning assumption as follows: (a) first, it distinguishes a set of known categories as well as labels an unknown object as `unknown' and then (b) it incrementally learns the class of an unknown as and when the corresponding semantic labels become available.

Instance Segmentation Semantic Segmentation +1

3D Mitochondria Instance Segmentation with Spatio-Temporal Transformers

1 code implementation21 Mar 2023 Omkar Thawakar, Rao Muhammad Anwer, Jorma Laaksonen, Orly Reiner, Mubarak Shah, Fahad Shahbaz Khan

Accurate 3D mitochondria instance segmentation in electron microscopy (EM) is a challenging problem and serves as a prerequisite to empirically analyze their distributions and morphology.

Decoder Instance Segmentation +1

Where We Are and What We're Looking At: Query Based Worldwide Image Geo-localization Using Hierarchies and Scenes

no code implementations CVPR 2023 Brandon Clark, Alec Kerrigan, Parth Parag Kulkarni, Vicente Vivanco Cepeda, Mubarak Shah

To this end, we introduce an end-to-end transformer-based architecture that exploits the relationship between different geographic levels (which we refer to as hierarchies) and the corresponding visual scene information in an image through hierarchical cross-attention.

geo-localization Image-Based Localization +2

PivoTAL: Prior-Driven Supervision for Weakly-Supervised Temporal Action Localization

no code implementations CVPR 2023 Mamshad Nayeem Rizve, Gaurav Mittal, Ye Yu, Matthew Hall, Sandra Sajeev, Mubarak Shah, Mei Chen

To address this, we present PivoTAL, Prior-driven Supervision for Weakly-supervised Temporal Action Localization, to approach WTAL from a localization-by-localization perspective by learning to localize the action snippets directly.

Weakly Supervised Action Localization Weakly Supervised Temporal Action Localization

When Do Curricula Work in Federated Learning?

no code implementations ICCV 2023 Saeed Vahidian, Sreevatsank Kadaveru, Woonjoon Baek, Weijia Wang, Vyacheslav Kungurtsev, Chen Chen, Mubarak Shah, Bill Lin

Specifically, we aim to investigate how ordered learning principles can contribute to alleviating the heterogeneity effects in FL.

Federated Learning

Lightning Fast Video Anomaly Detection via Adversarial Knowledge Distillation

1 code implementation28 Nov 2022 Florinel-Alin Croitoru, Nicolae-Catalin Ristea, Dana Dascalescu, Radu Tudor Ionescu, Fahad Shahbaz Khan, Mubarak Shah

We propose a very fast frame-level model for anomaly detection in video, which learns to detect anomalies by distilling knowledge from multiple highly accurate object-level teacher models.

Anomaly Detection Knowledge Distillation +1

Query Efficient Cross-Dataset Transferable Black-Box Attack on Action Recognition

no code implementations23 Nov 2022 Rohit Gupta, Naveed Akhtar, Gaurav Kumar Nayak, Ajmal Mian, Mubarak Shah

By using a nearly disjoint dataset to train the substitute model, our method removes the requirement that the substitute model be trained using the same dataset as the target model, and leverages queries to the target model to retain the fooling rate benefits provided by query-based methods.

Action Recognition

Person Image Synthesis via Denoising Diffusion Model

1 code implementation CVPR 2023 Ankan Kumar Bhunia, Salman Khan, Hisham Cholakkal, Rao Muhammad Anwer, Jorma Laaksonen, Mubarak Shah, Fahad Shahbaz Khan

In this work, we show how denoising diffusion models can be applied for high-fidelity person image synthesis with strong sample diversity and enhanced mode coverage of the learnt data distribution.

Denoising Diversity +1

3DMODT: Attention-Guided Affinities for Joint Detection & Tracking in 3D Point Clouds

no code implementations1 Nov 2022 Jyoti Kini, Ajmal Mian, Mubarak Shah

We propose a method for joint detection and tracking of multiple objects in 3D point clouds, a task conventionally treated as a two-step process comprising object detection followed by data association.

object-detection Object Detection +1

Adversarial Pretraining of Self-Supervised Deep Networks: Past, Present and Future

no code implementations23 Oct 2022 Guo-Jun Qi, Mubarak Shah

In this paper, we review adversarial pretraining of self-supervised deep networks including both convolutional neural networks and vision transformers.

Contrastive Learning Miscellaneous

TransVisDrone: Spatio-Temporal Transformer for Vision-based Drone-to-Drone Detection in Aerial Videos

3 code implementations16 Oct 2022 Tushar Sangam, Ishan Rajendrakumar Dave, Waqas Sultani, Mubarak Shah

Drone-to-drone detection using visual feed has crucial applications, such as detecting drone collisions, detecting drone attacks, or coordinating flight with other drones.

Computational Efficiency Edge-computing

Rethinking Data Heterogeneity in Federated Learning: Introducing a New Notion and Standard Benchmarks

1 code implementation30 Sep 2022 Mahdi Morafah, Saeed Vahidian, Chen Chen, Mubarak Shah, Bill Lin

Though successful, federated learning presents new challenges for machine learning, especially when the issue of data heterogeneity, also known as Non-IID data, arises.

Federated Learning

Self-Supervised Masked Convolutional Transformer Block for Anomaly Detection

1 code implementation25 Sep 2022 Neelu Madan, Nicolae-Catalin Ristea, Radu Tudor Ionescu, Kamal Nasrollahi, Fahad Shahbaz Khan, Thomas B. Moeslund, Mubarak Shah

In this work, we extend our previous self-supervised predictive convolutional attentive block (SSPCAB) with a 3D masked convolutional layer, a transformer for channel-wise attention, as well as a novel self-supervised objective based on Huber loss.

Event Detection Fault Detection +1

Efficient Distribution Similarity Identification in Clustered Federated Learning via Principal Angles Between Client Data Subspaces

1 code implementation21 Sep 2022 Saeed Vahidian, Mahdi Morafah, Weijia Wang, Vyacheslav Kungurtsev, Chen Chen, Mubarak Shah, Bill Lin

This small set of principal vectors is provided to the server so that the server can directly identify distribution similarities among the clients to form clusters.

Federated Learning

Diffusion Models in Vision: A Survey

1 code implementation10 Sep 2022 Florinel-Alin Croitoru, Vlad Hondru, Radu Tudor Ionescu, Mubarak Shah

Denoising diffusion models represent a recent emerging topic in computer vision, demonstrating remarkable results in the area of generative modeling.

Denoising Survey

Contrastive Self-Supervised Learning Leads to Higher Adversarial Susceptibility

no code implementations22 Jul 2022 Rohit Gupta, Naveed Akhtar, Ajmal Mian, Mubarak Shah

We establish that this is a result of the presence of false negative pairs in the training process, which increases model sensitivity to input perturbations.

Adversarial Robustness Self-Supervised Learning +1

GAMa: Cross-view Video Geo-localization

1 code implementation6 Jul 2022 Shruti Vyas, Chen Chen, Mubarak Shah

There are no existing datasets for this problem, therefore we propose GAMa dataset, a large-scale dataset with ground videos and corresponding aerial images.

geo-localization

Towards Realistic Semi-Supervised Learning

1 code implementation5 Jul 2022 Mamshad Nayeem Rizve, Navid Kardan, Mubarak Shah

We also highlight the flexibility of our approach in solving novel class discovery task, demonstrate its stability in dealing with imbalanced data, and complement our approach with a technique to estimate the number of novel classes

Novel Class Discovery Open-World Semi-Supervised Learning +1

OpenLDN: Learning to Discover Novel Classes for Open-World Semi-Supervised Learning

1 code implementation5 Jul 2022 Mamshad Nayeem Rizve, Navid Kardan, Salman Khan, Fahad Shahbaz Khan, Mubarak Shah

In the open-world SSL problem, the objective is to recognize samples of known classes, and simultaneously detect and cluster samples belonging to novel classes present in unlabeled data.

Open-World Semi-Supervised Learning

Weakly Supervised Grounding for VQA in Vision-Language Transformers

1 code implementation5 Jul 2022 Aisha Urooj Khan, Hilde Kuehne, Chuang Gan, Niels da Vitoria Lobo, Mubarak Shah

Transformers for visual-language representation learning have been getting a lot of interest and shown tremendous performance on visual question answering (VQA) and grounding.

Question Answering Representation Learning +1

Self-Supervised Learning for Videos: A Survey

1 code implementation18 Jun 2022 Madeline C. Schiappa, Yogesh S. Rawat, Mubarak Shah

In this survey, we provide a review of existing approaches on self-supervised learning focusing on the video domain.

Contrastive Learning Domain Generalization +3

Learning with Capsules: A Survey

no code implementations6 Jun 2022 Fabio De Sousa Ribeiro, Kevin Duarte, Miles Everett, Georgios Leontidis, Mubarak Shah

The aim of this survey is to provide a comprehensive overview of the capsule network research landscape, which will serve as a valuable resource for the community going forward.

Graph Representation Learning Survey

EBM Life Cycle: MCMC Strategies for Synthesis, Defense, and Density Modeling

1 code implementation24 May 2022 Mitch Hill, Jonathan Mitchell, Chu Chen, Yuan Du, Mubarak Shah, Song-Chun Zhu

This work presents strategies to learn an Energy-Based Model (EBM) according to the desired length of its MCMC sampling trajectories.

Adversarial Defense Image Generation +1

Self-Supervised Video Object Segmentation via Cutout Prediction and Tagging

no code implementations22 Apr 2022 Jyoti Kini, Fahad Shahbaz Khan, Salman Khan, Mubarak Shah

We propose a novel self-supervised Video Object Segmentation (VOS) approach that strives to achieve better object-background discriminability for accurate object segmentation.

Object Segmentation +4

Tag-Based Attention Guided Bottom-Up Approach for Video Instance Segmentation

no code implementations22 Apr 2022 Jyoti Kini, Mubarak Shah

Video Instance Segmentation is a fundamental computer vision task that deals with segmenting and tracking object instances across a video sequence.

Instance Segmentation Semantic Segmentation +3

Video Action Detection: Analysing Limitations and Challenges

no code implementations17 Apr 2022 Rajat Modi, Aayush Jung Rana, Akash Kumar, Praveen Tirupattur, Shruti Vyas, Yogesh Singh Rawat, Mubarak Shah

Beyond possessing large enough size to feed data hungry machines (eg, transformers), what attributes measure the quality of a dataset?

Action Detection

PSTR: End-to-End One-Step Person Search With Transformers

1 code implementation CVPR 2022 Jiale Cao, Yanwei Pang, Rao Muhammad Anwer, Hisham Cholakkal, Jin Xie, Mubarak Shah, Fahad Shahbaz Khan

We propose a novel one-step transformer-based person search framework, PSTR, that jointly performs person detection and re-identification (re-id) in a single architecture.

Decoder Human Detection +1

SPAct: Self-supervised Privacy Preservation for Action Recognition

1 code implementation CVPR 2022 Ishan Rajendrakumar Dave, Chen Chen, Mubarak Shah

Existing approaches for mitigating privacy leakage in action recognition require privacy labels along with the action labels from the video dataset.

Action Classification Action Recognition +2

Mesh Convolution with Continuous Filters for 3D Surface Parsing

2 code implementations3 Dec 2021 Huan Lei, Naveed Akhtar, Mubarak Shah, Ajmal Mian

In this paper, we propose a series of modular operations for effective geometric feature learning from 3D triangle meshes.

Scene Parsing Scene Segmentation

OW-DETR: Open-world Detection Transformer

2 code implementations CVPR 2022 Akshita Gupta, Sanath Narayan, K J Joseph, Salman Khan, Fahad Shahbaz Khan, Mubarak Shah

In the case of incremental object detection, OW-DETR outperforms the state-of-the-art for all settings on PASCAL VOC.

Inductive Bias Object +3

Routing with Self-Attention for Multimodal Capsule Networks

no code implementations1 Dec 2021 Kevin Duarte, Brian Chen, Nina Shvetsova, Andrew Rouditchenko, Samuel Thomas, Alexander Liu, David Harwath, James Glass, Hilde Kuehne, Mubarak Shah

We present a new multimodal capsule network that allows us to leverage the strength of capsules in the context of a multimodal learning framework on large amounts of video data.

"Knights": First Place Submission for VIPriors21 Action Recognition Challenge at ICCV 2021

no code implementations14 Oct 2021 Ishan Dave, Naman Biyani, Brandon Clark, Rohit Gupta, Yogesh Rawat, Mubarak Shah

This technical report presents our approach "Knights" to solve the action recognition task on a small subset of Kinetics-400 i. e. Kinetics400ViPriors without using any extra-data.

Action Recognition Optical Flow Estimation

Discriminative Region-based Multi-Label Zero-Shot Learning

1 code implementation ICCV 2021 Sanath Narayan, Akshita Gupta, Salman Khan, Fahad Shahbaz Khan, Ling Shao, Mubarak Shah

We note that the best existing multi-label ZSL method takes a shared approach towards attending to region features with a common set of attention maps for all the classes.

Image Retrieval Multi-label zero-shot learning

Advances in adversarial attacks and defenses in computer vision: A survey

no code implementations1 Aug 2021 Naveed Akhtar, Ajmal Mian, Navid Kardan, Mubarak Shah

In [2], we reviewed the contributions made by the computer vision community in adversarial attacks on deep learning (and their defenses) until the advent of year 2018.

Video Generation from Text Employing Latent Path Construction for Temporal Modeling

no code implementations29 Jul 2021 Amir Mazaheri, Mubarak Shah

To the best of our knowledge, this is the very first work on the text (free-form sentences) to video generation on more realistic video datasets like Actor and Action Dataset (A2D) or UCF101.

Text-to-Video Generation Video Generation

TinyAction Challenge: Recognizing Real-world Low-resolution Activities in Videos

1 code implementation24 Jul 2021 Praveen Tirupattur, Aayush J Rana, Tushar Sangam, Shruti Vyas, Yogesh S Rawat, Mubarak Shah

While various approaches have been shown effective for recognition task in recent works, they often do not deal with videos of lower resolution where the action is happening in a tiny region.

Action Recognition

Controlled Caption Generation for Images Through Adversarial Attacks

no code implementations7 Jul 2021 Nayyer Aafaq, Naveed Akhtar, Wei Liu, Mubarak Shah, Ajmal Mian

In contrast, we propose a GAN-based algorithm for crafting adversarial examples for neural image captioning that mimics the internal representation of the CNN such that the resulting deep features of the input image enable a controlled incorrect caption generation through the recurrent network.

Caption Generation Image Captioning +1

Florida Wildlife Camera Trap Dataset

no code implementations23 Jun 2021 Crystal Gagne, Jyoti Kini, Daniel Smith, Mubarak Shah

Trail camera imagery has increasingly gained popularity amongst biologists for conservation and ecological research.

Image Classification

Out-of-Distribution Detection Using Union of 1-Dimensional Subspaces

2 code implementations CVPR 2021 Alireza Zaeemzadeh, Niccolo Bisagno, Zeno Sambugaro, Nicola Conci, Nazanin Rahnavard, Mubarak Shah

In this paper, we argue that OOD samples can be detected more easily if the training data is embedded into a low-dimensional space, such that the embedded training samples lie on a union of 1-dimensional subspaces.

Bayesian Inference Out-of-Distribution Detection +2

Novel View Video Prediction Using a Dual Representation

no code implementations7 Jun 2021 Sarah Shiraz, Krishna Regmi, Shruti Vyas, Yogesh S. Rawat, Mubarak Shah

We address the problem of novel view video prediction; given a set of input video clips from a single/multiple views, our network is able to predict the video from a novel view.

SSIM Video Prediction

PLM: Partial Label Masking for Imbalanced Multi-label Classification

no code implementations22 May 2021 Kevin Duarte, Yogesh S. Rawat, Mubarak Shah

By stochastically masking labels during loss computation, the method balances this ratio for each class, leading to improved recall on minority classes and improved precision on frequent classes.

Classification Image Classification +1

MutualNet: Adaptive ConvNet via Mutual Learning from Different Model Configurations

1 code implementation14 May 2021 Taojiannan Yang, Sijie Zhu, Matias Mendieta, Pu Wang, Ravikumar Balakrishnan, Minwoo Lee, Tao Han, Mubarak Shah, Chen Chen

MutualNet is a general training methodology that can be applied to various network structures (e. g., 2D networks: MobileNets, ResNet, 3D networks: SlowFast, X3D) and various tasks (e. g., image classification, object detection, segmentation, and action recognition), and is demonstrated to achieve consistent improvements on a variety of datasets.

Action Recognition Image Classification +2

Found a Reason for me? Weakly-supervised Grounded Visual Question Answering using Capsules

1 code implementation CVPR 2021 Aisha Urooj Khan, Hilde Kuehne, Kevin Duarte, Chuang Gan, Niels Lobo, Mubarak Shah

In this paper, we focus on a more relaxed setting: the grounding of relevant visual entities in a weakly supervised manner by training on the VQA task alone.

Question Answering Visual Question Answering

Handwriting Transformers

1 code implementation ICCV 2021 Ankan Kumar Bhunia, Salman Khan, Hisham Cholakkal, Rao Muhammad Anwer, Fahad Shahbaz Khan, Mubarak Shah

We propose a novel transformer-based styled handwritten text image generation approach, HWT, that strives to learn both style-content entanglement as well as global and local writing style patterns.

Decoder Image Generation +1

Dogfight: Detecting Drones from Drones Videos

2 code implementations CVPR 2021 Muhammad Waseem Ashraf, Waqas Sultani, Mubarak Shah

The erratic movement of the source and target drones, small size, arbitrary shape, large intensity variations, and occlusion make this problem quite challenging.

Region Proposal

LSDAT: Low-Rank and Sparse Decomposition for Decision-based Adversarial Attack

no code implementations19 Mar 2021 Ashkan Esmaeili, Marzieh Edraki, Nazanin Rahnavard, Mubarak Shah, Ajmal Mian

It is set forth that the proposed sparse perturbation is the most aligned sparse perturbation with the shortest path from the input sample to the decision boundary for some initial adversarial sample (the best sparse approximation of shortest path, likely to fool the model).

Adversarial Attack Computational Efficiency +1

Modeling Multi-Label Action Dependencies for Temporal Action Localization

1 code implementation CVPR 2021 Praveen Tirupattur, Kevin Duarte, Yogesh Rawat, Mubarak Shah

We propose to improve action localization performance by modeling these action dependencies in a novel attention-based Multi-Label Action Dependency (MLAD)layer.

Action Detection Multi-Label Classification +1

TCLR: Temporal Contrastive Learning for Video Representation

1 code implementation20 Jan 2021 Ishan Dave, Rohit Gupta, Mamshad Nayeem Rizve, Mubarak Shah

However, prior work on contrastive learning for video data has not explored the effect of explicitly encouraging the features to be distinct across the temporal dimension.

Action Classification Contrastive Learning +7

Transformers in Vision: A Survey

no code implementations4 Jan 2021 Salman Khan, Muzammal Naseer, Munawar Hayat, Syed Waqas Zamir, Fahad Shahbaz Khan, Mubarak Shah

Astounding results from Transformer models on natural language tasks have intrigued the vision community to study their application to computer vision problems.

Action Recognition Colorization +11

Asymptotic Optimality of Self-Representative Low-Rank Approximation and Its Applications

no code implementations1 Jan 2021 Saeed Vahidian, Mohsen Joneidi, Ashkan Esmaeili, Siavash Khodadadeh, Sharare Zehtabian, Ladislau Boloni, Nazanin Rahnavard, Bill Lin, Mubarak Shah

The approach is based on the concept of {\em self-rank}, defined as the minimum number of samples needed to reconstruct all samples with an accuracy proportional to the rank-$K$ approximation.

Face Image Retrieval With Attribute Manipulation

no code implementations ICCV 2021 Alireza Zaeemzadeh, Shabnam Ghadar, Baldo Faieta, Zhe Lin, Nazanin Rahnavard, Mubarak Shah, Ratheesh Kalarot

For example, a user can ask for retrieving images similar to a query image, but with a different hair color, and no preference for absence/presence of eyeglasses in the results.

Attribute Face Image Retrieval +1

Video Geo-Localization Employing Geo-Temporal Feature Learning and GPS Trajectory Smoothing

1 code implementation ICCV 2021 Krishna Regmi, Mubarak Shah

In this paper, we address the problem of video geo-localization by proposing a Geo-Temporal Feature Learning (GTFL) Network to simultaneously learn the discriminative features between the query videos and gallery images for estimating the geo-spatial trajectory of a query video.

geo-localization Triplet

Correct block-design experiments mitigate temporal correlation bias in EEG classification

1 code implementation25 Nov 2020 Simone Palazzo, Concetto Spampinato, Joseph Schmidt, Isaak Kavasidis, Daniela Giordano, Mubarak Shah

We argue that the reason why Li et al. [1] observe such high correlation in EEG data is their unconventional experimental design and settings that violate the basic cognitive neuroscience design recommendations, first and foremost the one of limiting the experiments' duration, as instead done in [2].

Classification EEG +2

Anomaly Detection in Video via Self-Supervised and Multi-Task Learning

1 code implementation CVPR 2021 Mariana-Iuliana Georgescu, Antonio Barbalau, Radu Tudor Ionescu, Fahad Shahbaz Khan, Marius Popescu, Mubarak Shah

To the best of our knowledge, we are the first to approach anomalous event detection in video as a multi-task learning problem, integrating multiple self-supervised and knowledge distillation proxy tasks in a single architecture.

Abnormal Event Detection In Video Anomaly Detection In Surveillance Videos +4

MMFT-BERT: Multimodal Fusion Transformer with BERT Encodings for Visual Question Answering

1 code implementation Findings of the Association for Computational Linguistics 2020 Aisha Urooj Khan, Amir Mazaheri, Niels da Vitoria Lobo, Mubarak Shah

We present MMFT-BERT(MultiModal Fusion Transformer with BERT encodings), to solve Visual Question Answering (VQA) ensuring individual and combined processing of multiple input modalities.

Question Answering Visual Question Answering

Meta-learning the Learning Trends Shared Across Tasks

no code implementations19 Oct 2020 Jathushan Rajasegaran, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Mubarak Shah

This demonstrates their ability to acquire transferable knowledge, a capability that is central to human learning.

Meta-Learning

Uncertainty Estimation and Sample Selection for Crowd Counting

1 code implementation30 Sep 2020 Viresh Ranjan, Boyu Wang, Mubarak Shah, Minh Hoai

We present sample selection strategies which make use of the density and uncertainty of predictions from the networks trained on one domain to select the informative images from a target domain of interest to acquire human annotation.

Crowd Counting

Simultaneous Detection and Tracking with Motion Modelling for Multiple Object Tracking

3 code implementations ECCV 2020 Shi-Jie Sun, Naveed Akhtar, Xiang-Yu Song, HuanSheng Song, Ajmal Mian, Mubarak Shah

Deep learning-based Multiple Object Tracking (MOT) currently relies on off-the-shelf detectors for tracking-by-detection. This results in deep models that are detector biased and evaluations that are detector influenced.

Multiple Object Tracking Object

Deep Photo Cropper and Enhancer

no code implementations3 Aug 2020 Aaron Ott, Amir Mazaheri, Niels D. Lobo, Mubarak Shah

In the photo enhancer, we employ super-resolution to increase the number of pixels in the embedded image and reduce the effect of stretching and distortion of pixels.

Image Enhancement Super-Resolution

Odyssey: Creation, Analysis and Detection of Trojan Models

1 code implementation16 Jul 2020 Marzieh Edraki, Nazmul Karim, Nazanin Rahnavard, Ajmal Mian, Mubarak Shah

We propose a detector that is based on the analysis of the intrinsic DNN properties; that are affected due to the Trojaning process.

Data Poisoning

TinyVIRAT: Low-resolution Video Action Recognition

1 code implementation14 Jul 2020 Ugur Demir, Yogesh S Rawat, Mubarak Shah

In real-world surveillance environments, the actions in videos are captured at a wide range of resolutions.

Action Recognition Temporal Action Localization

Self-supervised Knowledge Distillation for Few-shot Learning

2 code implementations17 Jun 2020 Jathushan Rajasegaran, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Mubarak Shah

Our experiments show that, even in the first stage, self-supervision can outperform current state-of-the-art methods, with further gains achieved by our second stage distillation process.

Few-Shot Image Classification Few-Shot Learning +2

Gabriella: An Online System for Real-Time Activity Detection in Untrimmed Security Videos

no code implementations23 Apr 2020 Mamshad Nayeem Rizve, Ugur Demir, Praveen Tirupattur, Aayush Jung Rana, Kevin Duarte, Ishan Dave, Yogesh Singh Rawat, Mubarak Shah

For tubelet extraction, we propose a localization network which takes a video clip as input and spatio-temporally detects potential foreground regions at multiple scales to generate action tubelets.

Action Detection Activity Detection

RescueNet: Joint Building Segmentation and Damage Assessment from Satellite Imagery

no code implementations15 Apr 2020 Rohit Gupta, Mubarak Shah

Accurate and fine-grained information about the extent of damage to buildings is essential for directing Humanitarian Aid and Disaster Response (HADR) operations in the immediate aftermath of any natural calamity.

Classification Disaster Response +4

Adversarial Learning for Personalized Tag Recommendation

1 code implementation1 Apr 2020 Erik Quintanilla, Yogesh Rawat, Andrey Sakryukin, Mubarak Shah, Mohan Kankanhalli

We demonstrate the effectiveness of the proposed model on two different large-scale and publicly available datasets, YFCC100M and NUS-WIDE.

General Classification Image Classification +2

iTAML: An Incremental Task-Agnostic Meta-learning Approach

1 code implementation CVPR 2020 Jathushan Rajasegaran, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Mubarak Shah

In this paper, we hypothesize this problem can be avoided by learning a set of generalized parameters, that are neither specific to old nor new tasks.

Incremental Learning Meta-Learning

Subspace Capsule Network

1 code implementation7 Feb 2020 Marzieh Edraki, Nazanin Rahnavard, Mubarak Shah

In this paper, we propose the SubSpace Capsule Network (SCN) that exploits the idea of capsule networks to model possible variations in the appearance or implicitly defined properties of an entity through a group of capsule subspaces instead of simply grouping neurons to create capsules.

General Classification Generative Adversarial Network +2

Human Action Recognition in Drone Videos using a Few Aerial Training Examples

no code implementations22 Oct 2019 Waqas Sultani, Mubarak Shah

However, using deep neural networks for automatic aerial action recognition is difficult due to the need for a large number of training aerial human action videos.

Action Classification Action Recognition +1

Deep Constrained Dominant Sets for Person Re-identification

1 code implementation ICCV 2019 Leulseged Tesfaye Alemu, Marcello Pelillo, Mubarak Shah

By optimizing the constrained clustering in an end-to-end manner, we naturally leverage the contextual knowledge of a set of images corresponding to the given person-images.

Ranked #2 on Person Re-Identification on CUHK03 (Rank-5 metric)

Constrained Clustering Image Retrieval +2

Bridging the Domain Gap for Ground-to-Aerial Image Matching

1 code implementation ICCV 2019 Krishna Regmi, Mubarak Shah

Our Feature Fusion method combines the complementary features from a synthesized aerial image with the corresponding ground features to obtain a robust query representation.

Retrieval

Crowd Transformer Network

no code implementations4 Apr 2019 Viresh Ranjan, Mubarak Shah, Minh Hoai Nguyen

Most of the existing crowd counting approaches rely on local features for estimating the crowd density map.

Crowd Counting Density Estimation

Iterative Projection and Matching: Finding Structure-preserving Representatives and Its Application to Computer Vision

2 code implementations CVPR 2019 Mohsen Joneidi, Alireza Zaeemzadeh, Nazanin Rahnavard, Mubarak Shah

In our algorithm, at each iteration, the maximum information from the structure of the data is captured by one selected sample, and the captured information is neglected in the next iterations by projection on the null-space of previously selected samples.

Action Recognition Active Learning +5