Search Results for author: Mubarak Shah

Found 190 papers, 78 papers with code

Multi-view Action Recognition using Cross-view Video Prediction

1 code implementation ECCV 2020 Shruti Vyas, Yogesh S Rawat, Mubarak Shah

We evaluate the effectiveness of the learned representation for multi-view video action recognition in a supervised approach.

Action Recognition Representation Learning +2

Count- and Similarity-aware R-CNN for Pedestrian Detection

no code implementations ECCV 2020 Jin Xie, Hisham Cholakkal, Rao Muhammad Anwer, Fahad Shahbaz Khan, Yanwei Pang, Ling Shao, Mubarak Shah

We further introduce a count-and-similarity branch within the two-stage detection framework, which predicts pedestrian count as well as proposal similarity.

Human Instance Segmentation Pedestrian Detection +1

Composed Video Retrieval via Enriched Context and Discriminative Embeddings

1 code implementation25 Mar 2024 Omkar Thawakar, Muzammal Naseer, Rao Muhammad Anwer, Salman Khan, Michael Felsberg, Mubarak Shah, Fahad Shahbaz Khan

Composed video retrieval (CoVR) is a challenging problem in computer vision which has recently highlighted the integration of modification text with visual queries for more sophisticated video search in large databases.

Composed Video Retrieval (CoVR) Retrieval

AdaIR: Adaptive All-in-One Image Restoration via Frequency Mining and Modulation

1 code implementation21 Mar 2024 Yuning Cui, Syed Waqas Zamir, Salman Khan, Alois Knoll, Mubarak Shah, Fahad Shahbaz Khan

Our approach is motivated by the observation that different degradation types impact the image content on different frequency subbands, thereby requiring different treatments for each restoration task.

Deblurring Denoising +3

VidLA: Video-Language Alignment at Scale

no code implementations21 Mar 2024 Mamshad Nayeem Rizve, Fan Fei, Jayakrishnan Unnikrishnan, Son Tran, Benjamin Z. Yao, Belinda Zeng, Mubarak Shah, Trishul Chilimbi

To effectively address this limitation, we instead keep the network architecture simple and use a set of data tokens that operate at different temporal resolutions in a hierarchical manner, accounting for the temporally hierarchical nature of videos.

Language Modelling Visual Grounding

FSViewFusion: Few-Shots View Generation of Novel Objects

no code implementations11 Mar 2024 Rukhshanda Hussain, Hui Xian Grace Lim, BorChun Chen, Mubarak Shah, Ser Nam Lim

Second, we establish that the concept of a view can be disentangled and transferred to a novel object irrespective of the original object's identify from which the views are learnt.

Novel View Synthesis

CodaMal: Contrastive Domain Adaptation for Malaria Detection in Low-Cost Microscopes

1 code implementation16 Feb 2024 Ishan Rajendrakumar Dave, Tristan de Blegiers, Chen Chen, Mubarak Shah

Annotating images from LCM significantly increases the burden on medical experts compared to annotating images from high-cost microscopes (HCM).

Domain Adaptation object-detection +1

No More Shortcuts: Realizing the Potential of Temporal Self-Supervision

no code implementations20 Dec 2023 Ishan Rajendrakumar Dave, Simon Jenni, Mubarak Shah

To address these issues, we propose 1) a more challenging reformulation of temporal self-supervision as frame-level (rather than clip-level) recognition tasks and 2) an effective augmentation strategy to mitigate shortcuts.

Action Classification Attribute +7

DVANet: Disentangling View and Action Features for Multi-View Action Recognition

no code implementations10 Dec 2023 Nyle Siddiqui, Praveen Tirupattur, Mubarak Shah

In this work, we present a novel approach to multi-view action recognition where we guide learned action representations to be separated from view-relevant information in a video.

Action Recognition In Videos

Multiview Aerial Visual Recognition (MAVREC): Can Multi-view Improve Aerial Visual Perception?

no code implementations7 Dec 2023 Aritra Dutta, Srijan Das, Jacob Nielsen, Rajatsubhra Chakraborty, Mubarak Shah

Despite the commercial abundance of UAVs, aerial data acquisition remains challenging, and the existing Asia and North America-centric open-source UAV datasets are small-scale or low-resolution and lack diversity in scene contextuality.

Benchmarking object-detection +2

PG-Video-LLaVA: Pixel Grounding Large Video-Language Models

1 code implementation22 Nov 2023 Shehan Munasinghe, Rusiru Thushara, Muhammad Maaz, Hanoona Abdul Rasheed, Salman Khan, Mubarak Shah, Fahad Khan

Extending image-based Large Multimodal Models (LMMs) to videos is challenging due to the inherent complexity of video data.

Benchmarking Phrase Grounding +4

Egocentric RGB+Depth Action Recognition in Industry-Like Settings

1 code implementation25 Sep 2023 Jyoti Kini, Sarah Fleischer, Ishan Dave, Mubarak Shah

Our work focuses on recognizing actions from egocentric RGB and Depth modalities in an industry-like environment.

Action Recognition

Dual Student Networks for Data-Free Model Stealing

no code implementations18 Sep 2023 James Beetham, Navid Kardan, Ajmal Mian, Mubarak Shah

To this end, the two main challenges are estimating gradients of the target model without access to its parameters, and generating a diverse set of training samples that thoroughly explores the input space.

CDFSL-V: Cross-Domain Few-Shot Learning for Videos

1 code implementation ICCV 2023 Sarinda Samarasinghe, Mamshad Nayeem Rizve, Navid Kardan, Mubarak Shah

To address this issue, in this work, we propose a novel cross-domain few-shot video action recognition method that leverages self-supervised learning and curriculum learning to balance the information from the source and target domains.

cross-domain few-shot learning Few-Shot action recognition +3

EventTransAct: A video transformer-based framework for Event-camera based action recognition

no code implementations25 Aug 2023 Tristan de Blegiers, Ishan Rajendrakumar Dave, Adeel Yousaf, Mubarak Shah

Recognizing and comprehending human actions and gestures is a crucial perception requirement for robots to interact with humans and carry out tasks in diverse domains, including service robotics, healthcare, and manufacturing.

Action Recognition

Preserving Modality Structure Improves Multi-Modal Learning

1 code implementation ICCV 2023 Swetha Sirnam, Mamshad Nayeem Rizve, Nina Shvetsova, Hilde Kuehne, Mubarak Shah

Self-supervised learning on large-scale multi-modal datasets allows learning semantically meaningful embeddings in a joint multi-modal representation space without relying on human annotations.

Retrieval Self-Supervised Learning

TeD-SPAD: Temporal Distinctiveness for Self-supervised Privacy-preservation for video Anomaly Detection

no code implementations ICCV 2023 Joseph Fioresi, Ishan Rajendrakumar Dave, Mubarak Shah

In this paper, we propose TeD-SPAD, a privacy-aware video anomaly detection framework that destroys visual private information in a self-supervised manner.

Anomaly Detection Attribute +3

Ensemble Modeling for Multimodal Visual Action Recognition

1 code implementation10 Aug 2023 Jyoti Kini, Sarah Fleischer, Ishan Dave, Mubarak Shah

In this work, we propose an ensemble modeling approach for multimodal action recognition.

Action Recognition

Reverse Stable Diffusion: What prompt was used to generate this image?

no code implementations2 Aug 2023 Florinel-Alin Croitoru, Vlad Hondru, Radu Tudor Ionescu, Mubarak Shah

Our novel learning framework produces excellent results on the aforementioned task, yielding the highest gains when applied on the white-box model.

Text-to-Image Generation

Foundational Models Defining a New Era in Vision: A Survey and Outlook

1 code implementation25 Jul 2023 Muhammad Awais, Muzammal Naseer, Salman Khan, Rao Muhammad Anwer, Hisham Cholakkal, Mubarak Shah, Ming-Hsuan Yang, Fahad Shahbaz Khan

Vision systems to see and reason about the compositional nature of visual scenes are fundamental to understanding our world.

Benchmarking

Frequency Domain Adversarial Training for Robust Volumetric Medical Segmentation

1 code implementation14 Jul 2023 Asif Hanif, Muzammal Naseer, Salman Khan, Mubarak Shah, Fahad Shahbaz Khan

While recent advances in deep learning have improved the performance of volumetric medical image segmentation models, these models cannot be deployed for real-world applications immediately due to their vulnerability to adversarial attacks.

Adversarial Attack Image Segmentation +3

Exploiting the Brain's Network Structure for Automatic Identification of ADHD Subjects

no code implementations15 Jun 2023 Soumyabrata Dey, Ravishankar Rao, Mubarak Shah

The concatenation of the network features of all the voxels in a brain serves as the feature vector.

Learning Situation Hyper-Graphs for Video Question Answering

1 code implementation CVPR 2023 Aisha Urooj Khan, Hilde Kuehne, Bo Wu, Kim Chheu, Walid Bousselham, Chuang Gan, Niels Lobo, Mubarak Shah

The proposed method is trained in an end-to-end manner and optimized by a VQA loss with the cross-entropy function and a Hungarian matching loss for the situation graph prediction.

Ranked #6 on Video Question Answering on AGQA 2.0 balanced (Average Accuracy metric)

Question Answering Video Question Answering +1

Vita-CLIP: Video and text adaptive CLIP via Multimodal Prompting

1 code implementation CVPR 2023 Syed Talal Wasim, Muzammal Naseer, Salman Khan, Fahad Shahbaz Khan, Mubarak Shah

Through this prompting scheme, we can achieve state-of-the-art zero-shot performance on Kinetics-600, HMDB51 and UCF101 while remaining competitive in the supervised setting.

Action Recognition Video Classification +2

$R^{2}$Former: Unified $R$etrieval and $R$eranking Transformer for Place Recognition

no code implementations6 Apr 2023 Sijie Zhu, Linjie Yang, Chen Chen, Mubarak Shah, Xiaohui Shen, Heng Wang

Visual Place Recognition (VPR) estimates the location of query images by matching them with images in a reference database.

Feature Correlation Retrieval +1

Video Instance Segmentation in an Open-World

1 code implementation3 Apr 2023 Omkar Thawakar, Sanath Narayan, Hisham Cholakkal, Rao Muhammad Anwer, Salman Khan, Jorma Laaksonen, Mubarak Shah, Fahad Shahbaz Khan

Open-world formulation relaxes the close-world static-learning assumption as follows: (a) first, it distinguishes a set of known categories as well as labels an unknown object as `unknown' and then (b) it incrementally learns the class of an unknown as and when the corresponding semantic labels become available.

Instance Segmentation Semantic Segmentation +1

3D Mitochondria Instance Segmentation with Spatio-Temporal Transformers

1 code implementation21 Mar 2023 Omkar Thawakar, Rao Muhammad Anwer, Jorma Laaksonen, Orly Reiner, Mubarak Shah, Fahad Shahbaz Khan

Accurate 3D mitochondria instance segmentation in electron microscopy (EM) is a challenging problem and serves as a prerequisite to empirically analyze their distributions and morphology.

Instance Segmentation Semantic Segmentation

Where We Are and What We're Looking At: Query Based Worldwide Image Geo-localization Using Hierarchies and Scenes

no code implementations CVPR 2023 Brandon Clark, Alec Kerrigan, Parth Parag Kulkarni, Vicente Vivanco Cepeda, Mubarak Shah

To this end, we introduce an end-to-end transformer-based architecture that exploits the relationship between different geographic levels (which we refer to as hierarchies) and the corresponding visual scene information in an image through hierarchical cross-attention.

Image-Based Localization Memorization +1

PivoTAL: Prior-Driven Supervision for Weakly-Supervised Temporal Action Localization

no code implementations CVPR 2023 Mamshad Nayeem Rizve, Gaurav Mittal, Ye Yu, Matthew Hall, Sandra Sajeev, Mubarak Shah, Mei Chen

To address this, we present PivoTAL, Prior-driven Supervision for Weakly-supervised Temporal Action Localization, to approach WTAL from a localization-by-localization perspective by learning to localize the action snippets directly.

Weakly Supervised Action Localization Weakly Supervised Temporal Action Localization

When Do Curricula Work in Federated Learning?

no code implementations ICCV 2023 Saeed Vahidian, Sreevatsank Kadaveru, Woonjoon Baek, Weijia Wang, Vyacheslav Kungurtsev, Chen Chen, Mubarak Shah, Bill Lin

Specifically, we aim to investigate how ordered learning principles can contribute to alleviating the heterogeneity effects in FL.

Federated Learning

Lightning Fast Video Anomaly Detection via Adversarial Knowledge Distillation

no code implementations28 Nov 2022 Nicolae-Catalin Ristea, Florinel-Alin Croitoru, Dana Dascalescu, Radu Tudor Ionescu, Fahad Shahbaz Khan, Mubarak Shah

We propose a very fast frame-level model for anomaly detection in video, which learns to detect anomalies by distilling knowledge from multiple highly accurate object-level teacher models.

Anomaly Detection Knowledge Distillation +1

Query Efficient Cross-Dataset Transferable Black-Box Attack on Action Recognition

no code implementations23 Nov 2022 Rohit Gupta, Naveed Akhtar, Gaurav Kumar Nayak, Ajmal Mian, Mubarak Shah

By using a nearly disjoint dataset to train the substitute model, our method removes the requirement that the substitute model be trained using the same dataset as the target model, and leverages queries to the target model to retain the fooling rate benefits provided by query-based methods.

Action Recognition

Person Image Synthesis via Denoising Diffusion Model

1 code implementation CVPR 2023 Ankan Kumar Bhunia, Salman Khan, Hisham Cholakkal, Rao Muhammad Anwer, Jorma Laaksonen, Mubarak Shah, Fahad Shahbaz Khan

In this work, we show how denoising diffusion models can be applied for high-fidelity person image synthesis with strong sample diversity and enhanced mode coverage of the learnt data distribution.

Denoising Image Generation

3DMODT: Attention-Guided Affinities for Joint Detection & Tracking in 3D Point Clouds

no code implementations1 Nov 2022 Jyoti Kini, Ajmal Mian, Mubarak Shah

We propose a method for joint detection and tracking of multiple objects in 3D point clouds, a task conventionally treated as a two-step process comprising object detection followed by data association.

object-detection Object Detection +1

Adversarial Pretraining of Self-Supervised Deep Networks: Past, Present and Future

no code implementations23 Oct 2022 Guo-Jun Qi, Mubarak Shah

In this paper, we review adversarial pretraining of self-supervised deep networks including both convolutional neural networks and vision transformers.

Contrastive Learning Miscellaneous

TransVisDrone: Spatio-Temporal Transformer for Vision-based Drone-to-Drone Detection in Aerial Videos

3 code implementations16 Oct 2022 Tushar Sangam, Ishan Rajendrakumar Dave, Waqas Sultani, Mubarak Shah

Drone-to-drone detection using visual feed has crucial applications, such as detecting drone collisions, detecting drone attacks, or coordinating flight with other drones.

Computational Efficiency Edge-computing

Rethinking Data Heterogeneity in Federated Learning: Introducing a New Notion and Standard Benchmarks

1 code implementation30 Sep 2022 Mahdi Morafah, Saeed Vahidian, Chen Chen, Mubarak Shah, Bill Lin

Though successful, federated learning presents new challenges for machine learning, especially when the issue of data heterogeneity, also known as Non-IID data, arises.

Federated Learning

Self-Supervised Masked Convolutional Transformer Block for Anomaly Detection

1 code implementation25 Sep 2022 Neelu Madan, Nicolae-Catalin Ristea, Radu Tudor Ionescu, Kamal Nasrollahi, Fahad Shahbaz Khan, Thomas B. Moeslund, Mubarak Shah

In this work, we extend our previous self-supervised predictive convolutional attentive block (SSPCAB) with a 3D masked convolutional layer, a transformer for channel-wise attention, as well as a novel self-supervised objective based on Huber loss.

Event Detection Fault Detection +1

Efficient Distribution Similarity Identification in Clustered Federated Learning via Principal Angles Between Client Data Subspaces

1 code implementation21 Sep 2022 Saeed Vahidian, Mahdi Morafah, Weijia Wang, Vyacheslav Kungurtsev, Chen Chen, Mubarak Shah, Bill Lin

This small set of principal vectors is provided to the server so that the server can directly identify distribution similarities among the clients to form clusters.

Federated Learning

Diffusion Models in Vision: A Survey

1 code implementation10 Sep 2022 Florinel-Alin Croitoru, Vlad Hondru, Radu Tudor Ionescu, Mubarak Shah

Denoising diffusion models represent a recent emerging topic in computer vision, demonstrating remarkable results in the area of generative modeling.

Denoising

Contrastive Self-Supervised Learning Leads to Higher Adversarial Susceptibility

no code implementations22 Jul 2022 Rohit Gupta, Naveed Akhtar, Ajmal Mian, Mubarak Shah

We establish that this is a result of the presence of false negative pairs in the training process, which increases model sensitivity to input perturbations.

Adversarial Robustness Self-Supervised Learning +1

GAMa: Cross-view Video Geo-localization

1 code implementation6 Jul 2022 Shruti Vyas, Chen Chen, Mubarak Shah

There are no existing datasets for this problem, therefore we propose GAMa dataset, a large-scale dataset with ground videos and corresponding aerial images.

Weakly Supervised Grounding for VQA in Vision-Language Transformers

1 code implementation5 Jul 2022 Aisha Urooj Khan, Hilde Kuehne, Chuang Gan, Niels da Vitoria Lobo, Mubarak Shah

Transformers for visual-language representation learning have been getting a lot of interest and shown tremendous performance on visual question answering (VQA) and grounding.

Question Answering Representation Learning +1

Towards Realistic Semi-Supervised Learning

1 code implementation5 Jul 2022 Mamshad Nayeem Rizve, Navid Kardan, Mubarak Shah

We also highlight the flexibility of our approach in solving novel class discovery task, demonstrate its stability in dealing with imbalanced data, and complement our approach with a technique to estimate the number of novel classes

Novel Class Discovery Open-World Semi-Supervised Learning +1

OpenLDN: Learning to Discover Novel Classes for Open-World Semi-Supervised Learning

1 code implementation5 Jul 2022 Mamshad Nayeem Rizve, Navid Kardan, Salman Khan, Fahad Shahbaz Khan, Mubarak Shah

In the open-world SSL problem, the objective is to recognize samples of known classes, and simultaneously detect and cluster samples belonging to novel classes present in unlabeled data.

Open-World Semi-Supervised Learning

Self-Supervised Learning for Videos: A Survey

1 code implementation18 Jun 2022 Madeline C. Schiappa, Yogesh S. Rawat, Mubarak Shah

In this survey, we provide a review of existing approaches on self-supervised learning focusing on the video domain.

Contrastive Learning Domain Generalization +2

Learning with Capsules: A Survey

no code implementations6 Jun 2022 Fabio De Sousa Ribeiro, Kevin Duarte, Miles Everett, Georgios Leontidis, Mubarak Shah

The aim of this survey is to provide a comprehensive overview of the capsule network research landscape, which will serve as a valuable resource for the community going forward.

Graph Representation Learning

EBM Life Cycle: MCMC Strategies for Synthesis, Defense, and Density Modeling

1 code implementation24 May 2022 Mitch Hill, Jonathan Mitchell, Chu Chen, Yuan Du, Mubarak Shah, Song-Chun Zhu

This work presents strategies to learn an Energy-Based Model (EBM) according to the desired length of its MCMC sampling trajectories.

Adversarial Defense Image Generation +1

Tag-Based Attention Guided Bottom-Up Approach for Video Instance Segmentation

no code implementations22 Apr 2022 Jyoti Kini, Mubarak Shah

Video Instance Segmentation is a fundamental computer vision task that deals with segmenting and tracking object instances across a video sequence.

Instance Segmentation Semantic Segmentation +3

Self-Supervised Video Object Segmentation via Cutout Prediction and Tagging

no code implementations22 Apr 2022 Jyoti Kini, Fahad Shahbaz Khan, Salman Khan, Mubarak Shah

We propose a novel self-supervised Video Object Segmentation (VOS) approach that strives to achieve better object-background discriminability for accurate object segmentation.

Object Segmentation +4

Video Action Detection: Analysing Limitations and Challenges

no code implementations17 Apr 2022 Rajat Modi, Aayush Jung Rana, Akash Kumar, Praveen Tirupattur, Shruti Vyas, Yogesh Singh Rawat, Mubarak Shah

Beyond possessing large enough size to feed data hungry machines (eg, transformers), what attributes measure the quality of a dataset?

Action Detection

PSTR: End-to-End One-Step Person Search With Transformers

1 code implementation CVPR 2022 Jiale Cao, Yanwei Pang, Rao Muhammad Anwer, Hisham Cholakkal, Jin Xie, Mubarak Shah, Fahad Shahbaz Khan

We propose a novel one-step transformer-based person search framework, PSTR, that jointly performs person detection and re-identification (re-id) in a single architecture.

Human Detection Person Search

SPAct: Self-supervised Privacy Preservation for Action Recognition

1 code implementation CVPR 2022 Ishan Rajendrakumar Dave, Chen Chen, Mubarak Shah

Existing approaches for mitigating privacy leakage in action recognition require privacy labels along with the action labels from the video dataset.

Action Classification Action Recognition +2

Mesh Convolution with Continuous Filters for 3D Surface Parsing

2 code implementations3 Dec 2021 Huan Lei, Naveed Akhtar, Mubarak Shah, Ajmal Mian

In this paper, we propose a series of modular operations for effective geometric feature learning from 3D triangle meshes.

Scene Parsing Scene Segmentation

OW-DETR: Open-world Detection Transformer

2 code implementations CVPR 2022 Akshita Gupta, Sanath Narayan, K J Joseph, Salman Khan, Fahad Shahbaz Khan, Mubarak Shah

In the case of incremental object detection, OW-DETR outperforms the state-of-the-art for all settings on PASCAL VOC.

Inductive Bias Object +3

Routing with Self-Attention for Multimodal Capsule Networks

no code implementations1 Dec 2021 Kevin Duarte, Brian Chen, Nina Shvetsova, Andrew Rouditchenko, Samuel Thomas, Alexander Liu, David Harwath, James Glass, Hilde Kuehne, Mubarak Shah

We present a new multimodal capsule network that allows us to leverage the strength of capsules in the context of a multimodal learning framework on large amounts of video data.

"Knights": First Place Submission for VIPriors21 Action Recognition Challenge at ICCV 2021

no code implementations14 Oct 2021 Ishan Dave, Naman Biyani, Brandon Clark, Rohit Gupta, Yogesh Rawat, Mubarak Shah

This technical report presents our approach "Knights" to solve the action recognition task on a small subset of Kinetics-400 i. e. Kinetics400ViPriors without using any extra-data.

Action Recognition Optical Flow Estimation

Discriminative Region-based Multi-Label Zero-Shot Learning

1 code implementation ICCV 2021 Sanath Narayan, Akshita Gupta, Salman Khan, Fahad Shahbaz Khan, Ling Shao, Mubarak Shah

We note that the best existing multi-label ZSL method takes a shared approach towards attending to region features with a common set of attention maps for all the classes.

Image Retrieval Multi-label zero-shot learning

Advances in adversarial attacks and defenses in computer vision: A survey

no code implementations1 Aug 2021 Naveed Akhtar, Ajmal Mian, Navid Kardan, Mubarak Shah

In [2], we reviewed the contributions made by the computer vision community in adversarial attacks on deep learning (and their defenses) until the advent of year 2018.

Video Generation from Text Employing Latent Path Construction for Temporal Modeling

no code implementations29 Jul 2021 Amir Mazaheri, Mubarak Shah

To the best of our knowledge, this is the very first work on the text (free-form sentences) to video generation on more realistic video datasets like Actor and Action Dataset (A2D) or UCF101.

Text-to-Video Generation Video Generation

TinyAction Challenge: Recognizing Real-world Low-resolution Activities in Videos

1 code implementation24 Jul 2021 Praveen Tirupattur, Aayush J Rana, Tushar Sangam, Shruti Vyas, Yogesh S Rawat, Mubarak Shah

While various approaches have been shown effective for recognition task in recent works, they often do not deal with videos of lower resolution where the action is happening in a tiny region.

Action Recognition

Controlled Caption Generation for Images Through Adversarial Attacks

no code implementations7 Jul 2021 Nayyer Aafaq, Naveed Akhtar, Wei Liu, Mubarak Shah, Ajmal Mian

In contrast, we propose a GAN-based algorithm for crafting adversarial examples for neural image captioning that mimics the internal representation of the CNN such that the resulting deep features of the input image enable a controlled incorrect caption generation through the recurrent network.

Image Captioning Language Modelling

Florida Wildlife Camera Trap Dataset

no code implementations23 Jun 2021 Crystal Gagne, Jyoti Kini, Daniel Smith, Mubarak Shah

Trail camera imagery has increasingly gained popularity amongst biologists for conservation and ecological research.

Image Classification

Out-of-Distribution Detection Using Union of 1-Dimensional Subspaces

2 code implementations CVPR 2021 Alireza Zaeemzadeh, Niccolo Bisagno, Zeno Sambugaro, Nicola Conci, Nazanin Rahnavard, Mubarak Shah

In this paper, we argue that OOD samples can be detected more easily if the training data is embedded into a low-dimensional space, such that the embedded training samples lie on a union of 1-dimensional subspaces.

Bayesian Inference Out-of-Distribution Detection +2

Novel View Video Prediction Using a Dual Representation

no code implementations7 Jun 2021 Sarah Shiraz, Krishna Regmi, Shruti Vyas, Yogesh S. Rawat, Mubarak Shah

We address the problem of novel view video prediction; given a set of input video clips from a single/multiple views, our network is able to predict the video from a novel view.

SSIM Video Prediction

PLM: Partial Label Masking for Imbalanced Multi-label Classification

no code implementations22 May 2021 Kevin Duarte, Yogesh S. Rawat, Mubarak Shah

By stochastically masking labels during loss computation, the method balances this ratio for each class, leading to improved recall on minority classes and improved precision on frequent classes.

Classification Image Classification +1

MutualNet: Adaptive ConvNet via Mutual Learning from Different Model Configurations

1 code implementation14 May 2021 Taojiannan Yang, Sijie Zhu, Matias Mendieta, Pu Wang, Ravikumar Balakrishnan, Minwoo Lee, Tao Han, Mubarak Shah, Chen Chen

MutualNet is a general training methodology that can be applied to various network structures (e. g., 2D networks: MobileNets, ResNet, 3D networks: SlowFast, X3D) and various tasks (e. g., image classification, object detection, segmentation, and action recognition), and is demonstrated to achieve consistent improvements on a variety of datasets.

Action Recognition Image Classification +2

Found a Reason for me? Weakly-supervised Grounded Visual Question Answering using Capsules

1 code implementation CVPR 2021 Aisha Urooj Khan, Hilde Kuehne, Kevin Duarte, Chuang Gan, Niels Lobo, Mubarak Shah

In this paper, we focus on a more relaxed setting: the grounding of relevant visual entities in a weakly supervised manner by training on the VQA task alone.

Question Answering Visual Question Answering

Handwriting Transformers

1 code implementation ICCV 2021 Ankan Kumar Bhunia, Salman Khan, Hisham Cholakkal, Rao Muhammad Anwer, Fahad Shahbaz Khan, Mubarak Shah

We propose a novel transformer-based styled handwritten text image generation approach, HWT, that strives to learn both style-content entanglement as well as global and local writing style patterns.

Image Generation Text Generation

Dogfight: Detecting Drones from Drones Videos

2 code implementations CVPR 2021 Muhammad Waseem Ashraf, Waqas Sultani, Mubarak Shah

The erratic movement of the source and target drones, small size, arbitrary shape, large intensity variations, and occlusion make this problem quite challenging.

Region Proposal

LSDAT: Low-Rank and Sparse Decomposition for Decision-based Adversarial Attack

no code implementations19 Mar 2021 Ashkan Esmaeili, Marzieh Edraki, Nazanin Rahnavard, Mubarak Shah, Ajmal Mian

It is set forth that the proposed sparse perturbation is the most aligned sparse perturbation with the shortest path from the input sample to the decision boundary for some initial adversarial sample (the best sparse approximation of shortest path, likely to fool the model).

Adversarial Attack Computational Efficiency +1

Modeling Multi-Label Action Dependencies for Temporal Action Localization

1 code implementation CVPR 2021 Praveen Tirupattur, Kevin Duarte, Yogesh Rawat, Mubarak Shah

We propose to improve action localization performance by modeling these action dependencies in a novel attention-based Multi-Label Action Dependency (MLAD)layer.

Action Detection Multi-Label Classification +1

TCLR: Temporal Contrastive Learning for Video Representation

1 code implementation20 Jan 2021 Ishan Dave, Rohit Gupta, Mamshad Nayeem Rizve, Mubarak Shah

However, prior work on contrastive learning for video data has not explored the effect of explicitly encouraging the features to be distinct across the temporal dimension.

Action Classification Contrastive Learning +7

Transformers in Vision: A Survey

no code implementations4 Jan 2021 Salman Khan, Muzammal Naseer, Munawar Hayat, Syed Waqas Zamir, Fahad Shahbaz Khan, Mubarak Shah

Astounding results from Transformer models on natural language tasks have intrigued the vision community to study their application to computer vision problems.

Action Recognition Colorization +10

Face Image Retrieval With Attribute Manipulation

no code implementations ICCV 2021 Alireza Zaeemzadeh, Shabnam Ghadar, Baldo Faieta, Zhe Lin, Nazanin Rahnavard, Mubarak Shah, Ratheesh Kalarot

For example, a user can ask for retrieving images similar to a query image, but with a different hair color, and no preference for absence/presence of eyeglasses in the results.

Attribute Face Image Retrieval +1

Asymptotic Optimality of Self-Representative Low-Rank Approximation and Its Applications

no code implementations1 Jan 2021 Saeed Vahidian, Mohsen Joneidi, Ashkan Esmaeili, Siavash Khodadadeh, Sharare Zehtabian, Ladislau Boloni, Nazanin Rahnavard, Bill Lin, Mubarak Shah

The approach is based on the concept of {\em self-rank}, defined as the minimum number of samples needed to reconstruct all samples with an accuracy proportional to the rank-$K$ approximation.

Video Geo-Localization Employing Geo-Temporal Feature Learning and GPS Trajectory Smoothing

1 code implementation ICCV 2021 Krishna Regmi, Mubarak Shah

In this paper, we address the problem of video geo-localization by proposing a Geo-Temporal Feature Learning (GTFL) Network to simultaneously learn the discriminative features between the query videos and gallery images for estimating the geo-spatial trajectory of a query video.

Correct block-design experiments mitigate temporal correlation bias in EEG classification

1 code implementation25 Nov 2020 Simone Palazzo, Concetto Spampinato, Joseph Schmidt, Isaak Kavasidis, Daniela Giordano, Mubarak Shah

We argue that the reason why Li et al. [1] observe such high correlation in EEG data is their unconventional experimental design and settings that violate the basic cognitive neuroscience design recommendations, first and foremost the one of limiting the experiments' duration, as instead done in [2].

Classification EEG +3

Anomaly Detection in Video via Self-Supervised and Multi-Task Learning

1 code implementation CVPR 2021 Mariana-Iuliana Georgescu, Antonio Barbalau, Radu Tudor Ionescu, Fahad Shahbaz Khan, Marius Popescu, Mubarak Shah

To the best of our knowledge, we are the first to approach anomalous event detection in video as a multi-task learning problem, integrating multiple self-supervised and knowledge distillation proxy tasks in a single architecture.

Abnormal Event Detection In Video Anomaly Detection In Surveillance Videos +4

MMFT-BERT: Multimodal Fusion Transformer with BERT Encodings for Visual Question Answering

1 code implementation Findings of the Association for Computational Linguistics 2020 Aisha Urooj Khan, Amir Mazaheri, Niels da Vitoria Lobo, Mubarak Shah

We present MMFT-BERT(MultiModal Fusion Transformer with BERT encodings), to solve Visual Question Answering (VQA) ensuring individual and combined processing of multiple input modalities.

Question Answering Visual Question Answering

Meta-learning the Learning Trends Shared Across Tasks

no code implementations19 Oct 2020 Jathushan Rajasegaran, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Mubarak Shah

This demonstrates their ability to acquire transferable knowledge, a capability that is central to human learning.

Meta-Learning

Uncertainty Estimation and Sample Selection for Crowd Counting

1 code implementation30 Sep 2020 Viresh Ranjan, Boyu Wang, Mubarak Shah, Minh Hoai

We present sample selection strategies which make use of the density and uncertainty of predictions from the networks trained on one domain to select the informative images from a target domain of interest to acquire human annotation.

Crowd Counting

Simultaneous Detection and Tracking with Motion Modelling for Multiple Object Tracking

3 code implementations ECCV 2020 Shi-Jie Sun, Naveed Akhtar, Xiang-Yu Song, HuanSheng Song, Ajmal Mian, Mubarak Shah

Deep learning-based Multiple Object Tracking (MOT) currently relies on off-the-shelf detectors for tracking-by-detection. This results in deep models that are detector biased and evaluations that are detector influenced.

Multiple Object Tracking Object

Deep Photo Cropper and Enhancer

no code implementations3 Aug 2020 Aaron Ott, Amir Mazaheri, Niels D. Lobo, Mubarak Shah

In the photo enhancer, we employ super-resolution to increase the number of pixels in the embedded image and reduce the effect of stretching and distortion of pixels.

Image Enhancement Super-Resolution

Odyssey: Creation, Analysis and Detection of Trojan Models

1 code implementation16 Jul 2020 Marzieh Edraki, Nazmul Karim, Nazanin Rahnavard, Ajmal Mian, Mubarak Shah

We propose a detector that is based on the analysis of the intrinsic DNN properties; that are affected due to the Trojaning process.

Data Poisoning

TinyVIRAT: Low-resolution Video Action Recognition

1 code implementation14 Jul 2020 Ugur Demir, Yogesh S Rawat, Mubarak Shah

In real-world surveillance environments, the actions in videos are captured at a wide range of resolutions.

Action Recognition Temporal Action Localization

Self-supervised Knowledge Distillation for Few-shot Learning

1 code implementation17 Jun 2020 Jathushan Rajasegaran, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Mubarak Shah

Our experiments show that, even in the first stage, self-supervision can outperform current state-of-the-art methods, with further gains achieved by our second stage distillation process.

Few-Shot Image Classification Few-Shot Learning +2

Gabriella: An Online System for Real-Time Activity Detection in Untrimmed Security Videos

no code implementations23 Apr 2020 Mamshad Nayeem Rizve, Ugur Demir, Praveen Tirupattur, Aayush Jung Rana, Kevin Duarte, Ishan Dave, Yogesh Singh Rawat, Mubarak Shah

For tubelet extraction, we propose a localization network which takes a video clip as input and spatio-temporally detects potential foreground regions at multiple scales to generate action tubelets.

Action Detection Activity Detection

RescueNet: Joint Building Segmentation and Damage Assessment from Satellite Imagery

no code implementations15 Apr 2020 Rohit Gupta, Mubarak Shah

Accurate and fine-grained information about the extent of damage to buildings is essential for directing Humanitarian Aid and Disaster Response (HADR) operations in the immediate aftermath of any natural calamity.

Classification Disaster Response +4

Adversarial Learning for Personalized Tag Recommendation

1 code implementation1 Apr 2020 Erik Quintanilla, Yogesh Rawat, Andrey Sakryukin, Mubarak Shah, Mohan Kankanhalli

We demonstrate the effectiveness of the proposed model on two different large-scale and publicly available datasets, YFCC100M and NUS-WIDE.

General Classification Image Classification +2

iTAML: An Incremental Task-Agnostic Meta-learning Approach

1 code implementation CVPR 2020 Jathushan Rajasegaran, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Mubarak Shah

In this paper, we hypothesize this problem can be avoided by learning a set of generalized parameters, that are neither specific to old nor new tasks.

Incremental Learning Meta-Learning

Subspace Capsule Network

1 code implementation7 Feb 2020 Marzieh Edraki, Nazanin Rahnavard, Mubarak Shah

In this paper, we propose the SubSpace Capsule Network (SCN) that exploits the idea of capsule networks to model possible variations in the appearance or implicitly defined properties of an entity through a group of capsule subspaces instead of simply grouping neurons to create capsules.

General Classification Generative Adversarial Network +2

Human Action Recognition in Drone Videos using a Few Aerial Training Examples

no code implementations22 Oct 2019 Waqas Sultani, Mubarak Shah

However, using deep neural networks for automatic aerial action recognition is difficult due to the need for a large number of training aerial human action videos.

Action Classification Action Recognition +1

Deep Constrained Dominant Sets for Person Re-identification

1 code implementation ICCV 2019 Leulseged Tesfaye Alemu, Marcello Pelillo, Mubarak Shah

By optimizing the constrained clustering in an end-to-end manner, we naturally leverage the contextual knowledge of a set of images corresponding to the given person-images.

Ranked #2 on Person Re-Identification on CUHK03 (Rank-5 metric)

Constrained Clustering Image Retrieval +2

Bridging the Domain Gap for Ground-to-Aerial Image Matching

1 code implementation ICCV 2019 Krishna Regmi, Mubarak Shah

Our Feature Fusion method combines the complementary features from a synthesized aerial image with the corresponding ground features to obtain a robust query representation.

Retrieval

Crowd Transformer Network

no code implementations4 Apr 2019 Viresh Ranjan, Mubarak Shah, Minh Hoai Nguyen

Most of the existing crowd counting approaches rely on local features for estimating the crowd density map.

Crowd Counting Density Estimation

Iterative Projection and Matching: Finding Structure-preserving Representatives and Its Application to Computer Vision

2 code implementations CVPR 2019 Mohsen Joneidi, Alireza Zaeemzadeh, Nazanin Rahnavard, Mubarak Shah

In our algorithm, at each iteration, the maximum information from the structure of the data is captured by one selected sample, and the captured information is neglected in the next iterations by projection on the null-space of previously selected samples.

Action Recognition Active Learning +5

Time-Aware and View-Aware Video Rendering for Unsupervised Representation Learning

no code implementations26 Nov 2018 Shruti Vyas, Yogesh S Rawat, Mubarak Shah

We demonstrate the effectiveness of the proposed method in rendering view-aware as well as time-aware video clips on two different real-world datasets including UCF-101 and NTU-RGB+D.

Representation Learning

Deep Affinity Network for Multiple Object Tracking

1 code implementation28 Oct 2018 Shi-Jie Sun, Naveed Akhtar, HuanSheng Song, Ajmal Mian, Mubarak Shah

In this paper, we harness the power of deep learning for data association in tracking by jointly modelling object appearances and their affinities between different frames in an end-to-end fashion.

Benchmarking Multiple Object Tracking +3

Decoding Brain Representations by Multimodal Learning of Neural Activity and Visual Features

no code implementations25 Oct 2018 Simone Palazzo, Concetto Spampinato, Isaak Kavasidis, Daniela Giordano, Joseph Schmidt, Mubarak Shah

After verifying that visual information can be extracted from EEG data, we introduce a multimodal approach that uses deep image and EEG encoders, trained in a siamese configuration, for learning a joint manifold that maximizes a compatibility measure between visual features and brain representations.

Classification EEG +4

Pay attention! - Robustifying a Deep Visuomotor Policy through Task-Focused Attention

no code implementations26 Sep 2018 Pooya Abolghasemi, Amir Mazaheri, Mubarak Shah, Ladislau Bölöni

In this paper, we propose an approach for augmenting a deep visuomotor policy trained through demonstrations with Task Focused visual Attention (TFA).

Composition Loss for Counting, Density Map Estimation and Localization in Dense Crowds

no code implementations ECCV 2018 Haroon Idrees, Muhmmad Tayyab, Kishan Athrey, Dong Zhang, Somaya Al-Maadeed, Nasir Rajpoot, Mubarak Shah

With multiple crowd gatherings of millions of people every year in events ranging from pilgrimages to protests, concerts to marathons, and festivals to funerals; visual crowd analysis is emerging as a new frontier in computer vision.

Crowd Counting Management +1

Training Faster by Separating Modes of Variation in Batch-normalized Models

no code implementations7 Jun 2018 Mahdi M. Kalayeh, Mubarak Shah

We show that assuming samples within a mini-batch are from the same probability density function, then BN is identical to the Fisher vector of a Gaussian distribution.

Image Classification

Video Description: A Survey of Methods, Datasets and Evaluation Metrics

no code implementations1 Jun 2018 Nayyer Aafaq, Ajmal Mian, Wei Liu, Syed Zulqarnain Gilani, Mubarak Shah

Video description is the automatic generation of natural language sentences that describe the contents of a given video.

Language Modelling Video Description

Task-Agnostic Meta-Learning for Few-shot Learning

no code implementations20 May 2018 Muhammad Abdullah Jamal, Guo-Jun Qi, Mubarak Shah

Meta-learning approaches have been proposed to tackle the few-shot learning problem. Typically, a meta-learner is trained on a variety of tasks in the hopes of being generalizable to new tasks.

Classification Few-Shot Learning +1

Norm-Preservation: Why Residual Networks Can Become Extremely Deep?

1 code implementation18 May 2018 Alireza Zaeemzadeh, Nazanin Rahnavard, Mubarak Shah

We prove that the skip connections in the residual blocks facilitate preserving the norm of the gradient, and lead to stable back-propagation, which is desirable from optimization perspective.

Human Semantic Parsing for Person Re-identification

no code implementations CVPR 2018 Mahdi M. Kalayeh, Emrah Basaran, Muhittin Gokmen, Mustafa E. Kamasak, Mubarak Shah

In this paper, we propose to adopt human semantic parsing which, due to its pixel-level accuracy and capability of modeling arbitrary contours, is naturally a better alternative.

Person Re-Identification Representation Learning +1

Real-world Anomaly Detection in Surveillance Videos

8 code implementations CVPR 2018 Waqas Sultani, Chen Chen, Mubarak Shah

To avoid annotating the anomalous segments or clips in training videos, which is very time consuming, we propose to learn anomaly through the deep multiple instance ranking framework by leveraging weakly labeled training videos, i. e. the training labels (anomalous or normal) are at video-level instead of clip-level.

Activity Recognition Anomaly Detection In Surveillance Videos +2

Visual Text Correction

1 code implementation ECCV 2018 Amir Mazaheri, Mubarak Shah

A semantic inconsistency between the sentence and the video or between the words of a sentence can result in an inaccurate description.

Grammatical Error Correction Sentence +1

An End-to-end 3D Convolutional Neural Network for Action Detection and Segmentation in Videos

no code implementations30 Nov 2017 Rui Hou, Chen Chen, Mubarak Shah

A video is first divided into equal length clips and next for each clip a set of tube proposals are generated based on 3D CNN features.

Action Detection Action Segmentation +4

Generative Adversarial Networks Conditioned by Brain Signals

no code implementations ICCV 2017 Simone Palazzo, Concetto Spampinato, Isaak Kavasidis, Daniela Giordano, Mubarak Shah

In this work, we build on the latter class of approaches and investigate the possibility of driving and conditioning the image generation process by means of brain signals recorded, through an electroencephalograph (EEG), while users look at images from a set of 40 ImageNet object categories with the objective of generating the seen images.

EEG Electroencephalogram (EEG) +1

Unsupervised Action Discovery and Localization in Videos

no code implementations ICCV 2017 Khurram Soomro, Mubarak Shah

Once classes are discovered, training videos within each cluster are selected to perform automatic spatio-temporal annotations, by first oversegmenting videos in each discovered class into supervoxels and constructing a directed graph to apply a variant of knapsack problem with temporal constraints.

Action Localization Clustering

Multi-Target Tracking in Multiple Non-Overlapping Cameras using Constrained Dominant Sets

no code implementations19 Jun 2017 Yonatan Tariku Tesfaye, Eyasu Zemene, Andrea Prati, Marcello Pelillo, Mubarak Shah

In this paper, a unified three-layer hierarchical approach for solving tracking problems in multiple non-overlapping cameras is proposed.

Clustering

Video Fill In the Blank using LR/RL LSTMs with Spatial-Temporal Attentions

1 code implementation ICCV 2017 Amir Mazaheri, Dong Zhang, Mubarak Shah

Since the source sentence is broken into two fragments: the sentence's left fragment (before the blank) and the sentence's right fragment (after the blank), traditional Recurrent Neural Networks cannot encode this structure accurately because of many possible variations of the missing word in terms of the location and type of the word in the source sentence.

Sentence

ClusterNet: Detecting Small Objects in Large Scenes by Exploiting Spatio-Temporal Information

no code implementations CVPR 2018 Rodney LaLonde, Dong Zhang, Mubarak Shah

To reduce the large search space, the first stage (ClusterNet) takes in a set of extremely large video frames, combines the motion and appearance information within the convolutional architecture, and proposes regions of objects of interest (ROOBI).

Object object-detection +1

Unsupervised Action Proposal Ranking through Proposal Recombination

no code implementations3 Apr 2017 Waqas Sultani, Dong Zhang, Mubarak Shah

Given the action proposals in a video, the goal of the proposed work is to generate a few better action proposals that are ranked properly.

Action Detection Action Recognition +1

Tube Convolutional Neural Network (T-CNN) for Action Detection in Videos

1 code implementation ICCV 2017 Rui Hou, Chen Chen, Mubarak Shah

A video is first divided into equal length clips and for each clip a set of tube proposals are generated next based on 3D Convolutional Network (ConvNet) features.

Action Detection Image Classification +2

Cross-View Image Matching for Geo-localization in Urban Environments

1 code implementation CVPR 2017 Yicong Tian, Chen Chen, Mubarak Shah

Next, for each building in the query image, we retrieve the $k$ nearest neighbors from the reference buildings using a Siamese network trained on both positive matching image pairs and negative pairs.

Cross-View Image-to-Image Translation Image Classification +2

Re-identification of Humans in Crowds using Personal, Social and Environmental Constraints

no code implementations7 Dec 2016 Shayan Modiri Assari, Haroon Idrees, Mubarak Shah

This paper addresses the problem of human re-identification across non-overlapping cameras in crowds. Re-identification in crowded scenes is a challenging problem due to large number of people and frequent occlusions, coupled with changes in their appearance due to different properties and exposure of cameras.

Collision Avoidance Person Re-Identification

Online Localization and Prediction of Actions and Interactions

no code implementations4 Dec 2016 Khurram Soomro, Haroon Idrees, Mubarak Shah

For online prediction of action (interaction) confidences, we propose an approach based on Structural SVM that operates on short video segments, and is trained with the objective that confidence of an action or interaction increases as time progresses.

Pose Estimation Superpixels

On Duality Of Multiple Target Tracking and Segmentation

no code implementations14 Oct 2016 Yicong Tian, Mubarak Shah

For segmentation, multi-label Conditional Random Field (CRF) is applied to a superpixel based spatio-temporal graph in a segment of video to assign background or target labels to every superpixel.

Object Object Tracking +3

Video Fill in the Blank with Merging LSTMs

no code implementations13 Oct 2016 Amir Mazaheri, Dong Zhang, Mubarak Shah

In the experiments, we have demonstrated the superior performance of the proposed method on the challenging "Movie Fill-in-the-Blank" dataset.

Deep Learning Human Mind for Automated Visual Classification

2 code implementations CVPR 2017 Concetto Spampinato, Simone Palazzo, Isaak Kavasidis, Daniela Giordano, Mubarak Shah, Nasim Souly

In particular, we employ EEG data evoked by visual object stimuli combined with Recurrent Neural Networks (RNN) to learn a discriminative brain activity manifold of visual categories.

Classification EEG +4

Scene Labeling Through Knowledge-Based Rules Employing Constrained Integer Linear Programing

no code implementations17 Aug 2016 Nasim Souly, Mubarak Shah

In this paper, we propose to use high-level knowledge regarding rules in the inference to incorporate dependencies among regions in the image to improve scores of classification.

Scene Labeling

Query-Focused Extractive Video Summarization

no code implementations18 Jul 2016 Aidean Sharghi, Boqing Gong, Mubarak Shah

The decision to include a shot in the summary depends on the shot's relevance to the user query and importance in the context of the video, jointly.

Video Summarization

Covariance of Motion and Appearance Featuresfor Spatio Temporal Recognition Tasks

no code implementations16 Jun 2016 Subhabrata Bhattacharya, Nasim Souly, Mubarak Shah

Using an over-complete dictionary of the covariance based descriptors built from labeled training samples, we formulate low-level event recognition as a sparse linear approximation problem.

Gesture Recognition One-Shot Learning +1

Scene Labeling Using Sparse Precision Matrix

no code implementations CVPR 2016 Nasim Souly, Mubarak Shah

To do this, we formulate the problem as an energy minimization over a graph, whose structure is captured by applying sparse constraint on the elements of the precision matrix.

Scene Labeling

Predicting the Where and What of Actors and Actions Through Online Action Localization

no code implementations CVPR 2016 Khurram Soomro, Haroon Idrees, Mubarak Shah

This paper proposes a novel approach to tackle the challenging problem of 'online action localization' which entails predicting actions and their locations as they happen in a video.

Action Localization Superpixels

What If We Do Not Have Multiple Videos of the Same Action? -- Video Action Localization Using Web Images

no code implementations CVPR 2016 Waqas Sultani, Mubarak Shah

%We reconstruct video action proposals from image action proposals while enforcing consistency across coefficient vectors of multiple frames by consensus regularization.

Optical Flow Estimation Spatio-Temporal Action Localization +1

Fast Zero-Shot Image Tagging

no code implementations CVPR 2016 Yang Zhang, Boqing Gong, Mubarak Shah

The well-known word analogy experiments show that the recent word vectors capture fine-grained linguistic regularities in words by linear vector offsets, but it is unclear how well the simple vector offsets can encode visual regularities over words.

Multi-label zero-shot learning

Automatic Action Annotation in Weakly Labeled Videos

no code implementations26 May 2016 Waqas Sultani, Mubarak Shah

The output of our method is the most action representative proposals from each video.

Optical Flow Estimation

A Framework for Human Pose Estimation in Videos

no code implementations26 Apr 2016 Dong Zhang, Mubarak Shah

A sequence of the best poses is inferred from the abstract body part tracklets through the tree-based optimization.

Pose Estimation

The THUMOS Challenge on Action Recognition for Videos "in the Wild"

no code implementations21 Apr 2016 Haroon Idrees, Amir R. Zamir, Yu-Gang Jiang, Alex Gorban, Ivan Laptev, Rahul Sukthankar, Mubarak Shah

Additionally, we include a comprehensive empirical study evaluating the differences in action recognition between trimmed and untrimmed videos, and how well methods trained on trimmed videos generalize to untrimmed videos.

Action Classification Action Recognition +3

Binary Quadratic Programing for Online Tracking of Hundreds of People in Extremely Crowded Scenes

no code implementations30 Mar 2016 Afshin Dehghan, Mubarak Shah

In this paper, we propose a tracker that addresses the aforementioned problems and is capable of tracking hundreds of people efficiently.

Multi-Object Tracking

Autonomous navigation for low-altitude UAVs in urban areas

no code implementations25 Feb 2016 Thomas Castelli, Aidean Sharghi, Don Harper, Alain Tremeau, Mubarak Shah

In recent years, consumer Unmanned Aerial Vehicles have become very popular, everyone can buy and fly a drone without previous experience, which raises concern in regards to regulations and public safety.

Autonomous Navigation

Learning a Deep Model for Human Action Recognition from Novel Viewpoints

no code implementations2 Feb 2016 Hossein Rahmani, Ajmal Mian, Mubarak Shah

The strength of our technique is that we learn a single R-NKTM for all actions and all viewpoints for knowledge transfer of any real human action video without the need for re-training or fine-tuning the model.

Action Recognition Temporal Action Localization +1

Human Pose Estimation in Videos

no code implementations ICCV 2015 Dong Zhang, Mubarak Shah

Using the idea of `Association', the optimal tracklets are generated for each abstract body part, in order to enforce the spatiotemporal constraints between body parts in adjacent frames.

Pose Estimation

Action Localization in Videos Through Context Walk

no code implementations ICCV 2015 Khurram Soomro, Haroon Idrees, Mubarak Shah

Context relations are learned during training which capture displacements from all the supervoxels in a video to those belonging to foreground actions.

Action Localization

Geo-Semantic Segmentation

1 code implementation CVPR 2015 Shervin Ardeshir, Kofi Malcolm Collins-Sibley, Mubarak Shah

In this paper, we propose a method which leverages information acquired from GIS databases to perform semantic segmentation of the image alongside with geo-referencing each semantic segment with its address and geo-location.

Segmentation Semantic Segmentation

Target Identity-Aware Network Flow for Online Multiple Target Tracking

no code implementations CVPR 2015 Afshin Dehghan, Yicong Tian, Philip H. S. Torr, Mubarak Shah

In this paper we show that multiple object tracking (MOT) can be formulated in a framework, where the detection and data-association are performed simultaneously.

Multiple Object Tracking object-detection +1

Understanding Trajectory Behavior: A Motion Pattern Approach

no code implementations4 Jan 2015 Mahdi M. Kalayeh, Stephen Mussmann, Alla Petrakova, Niels da Vitoria Lobo, Mubarak Shah

In the second phase, via a Kmeans clustering approach, we create motion components by clustering the flow vectors with respect to their location and velocity.

Clustering Trajectory Clustering

Improving Semantic Concept Detection through the Dictionary of Visually-distinct Elements

no code implementations CVPR 2014 Afshin Dehghan, Haroon Idrees, Mubarak Shah

A video captures a sequence and interactions of concepts that can be static, for instance, objects or scenes, or dynamic, such as actions.

GPS-Tag Refinement using Random Walks with an Adaptive Damping Factor

no code implementations CVPR 2014 Amir Roshan Zamir, Shervin Ardeshir, Mubarak Shah

We develop a robust method for identification and refinement of this subset using the rest of the images in the dataset.

TAG

Recognition of Complex Events: Exploiting Temporal Dynamics between Underlying Concepts

no code implementations CVPR 2014 Subhabrata Bhattacharya, Mahdi M. Kalayeh, Rahul Sukthankar, Mubarak Shah

While approaches based on bags of features excel at low-level action classification, they are ill-suited for recognizing complex events in video, where concept-based temporal representations currently dominate.

Action Classification Event Detection +3

Face Verification Using Boosted Cross-Image Features

no code implementations28 Sep 2013 Dong Zhang, Omar Oreifej, Mubarak Shah

In contrast, we propose to extract cross-image features, i. e. features across the pair of images, which, as we demonstrate, is more discriminative to the similarity and the dissimilarity of faces.

Face Detection Face Recognition +1

Spatiotemporal Deformable Part Models for Action Detection

no code implementations CVPR 2013 Yicong Tian, Rahul Sukthankar, Mubarak Shah

Deformable part models have achieved impressive performance for object detection, even on difficult image datasets.

Action Detection object-detection +1

Improving an Object Detector and Extracting Regions Using Superpixels

no code implementations CVPR 2013 Guang Shu, Afshin Dehghan, Mubarak Shah

In general, our method takes detection bounding boxes of a generic detector as input and generates the detection output with higher average precision and precise object regions.

Object Superpixels

Semi-supervised Learning of Feature Hierarchies for Object Detection in a Video

no code implementations CVPR 2013 Yang Yang, Guang Shu, Mubarak Shah

In order to learn discriminative and compact features, we propose a new feature learning method using a deep neural network based on auto encoders.

object-detection Object Detection

Face Recognition in Movie Trailers via Mean Sequence Sparse Representation-Based Classification

no code implementations CVPR 2013 Enrique. G. Ortiz, Alan Wright, Mubarak Shah

A straightforward application of the popular n-minimization for face recognition on a frame-by-frame basis is prohibitively expensive, so we propose a novel algorithm Mean Sequence SRC (MSSRC) that performs video face recognition using a joint optimization leveraging all of the available video data and the knowledge that the face track frames belong to the same individual.

Face Recognition General Classification +1

Video Object Segmentation through Spatially Accurate and Temporally Dense Extraction of Primary Object Regions

no code implementations CVPR 2013 Dong Zhang, Omar Javed, Mubarak Shah

The proposed approach has several contributions: First, a novel layered Directed Acyclic Graph (DAG) based framework is presented for detection and segmentation of the primary object in video.

Object Optical Flow Estimation +4

Multi-source Multi-scale Counting in Extremely Dense Crowd Images

no code implementations CVPR 2013 Haroon Idrees, Imran Saleemi, Cody Seibert, Mubarak Shah

Instead, our approach relies on multiple sources such as low confidence head detections, repetition of texture elements (using SIFT), and frequency-domain analysis to estimate counts, along with confidence associated with observing individuals, in an image region.

Crowd Counting Human Detection

UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild

7 code implementations3 Dec 2012 Khurram Soomro, Amir Roshan Zamir, Mubarak Shah

To the best of our knowledge, UCF101 is currently the most challenging dataset of actions due to its large number of classes, large number of clips and also unconstrained nature of such clips.

Action Recognition In Videos Skeleton Based Action Recognition +1

Cannot find the paper you are looking for? You can Submit a new open access paper.