Search Results for author: Mubarak Shah

Found 128 papers, 40 papers with code

Multi-view Action Recognition using Cross-view Video Prediction

1 code implementation ECCV 2020 Shruti Vyas, Yogesh S Rawat, Mubarak Shah

We evaluate the effectiveness of the learned representation for multi-view video action recognition in a supervised approach.

Action Recognition Representation Learning +1

Count- and Similarity-aware R-CNN for Pedestrian Detection

no code implementations ECCV 2020 Jin Xie, Hisham Cholakkal, Rao Muhammad Anwer, Fahad Shahbaz Khan, Yanwei Pang, Ling Shao, Mubarak Shah

We further introduce a count-and-similarity branch within the two-stage detection framework, which predicts pedestrian count as well as proposal similarity.

Human Instance Segmentation Pedestrian Detection +1

Geometric Feature Learning for 3D Meshes

2 code implementations3 Dec 2021 Huan Lei, Naveed Akhtar, Mubarak Shah, Ajmal Mian

These operations include mesh convolutions, (un)pooling and efficient mesh decimation.

Scene Parsing

OW-DETR: Open-world Detection Transformer

no code implementations2 Dec 2021 Akshita Gupta, Sanath Narayan, K J Joseph, Salman Khan, Fahad Shahbaz Khan, Mubarak Shah

Distinct from standard object detection, the OWOD setting poses significant challenges for generating quality candidate proposals on potentially unknown objects, separating the unknown objects from the background and detecting diverse unknown objects.

Open World Object Detection Transfer Learning

Routing with Self-Attention for Multimodal Capsule Networks

no code implementations1 Dec 2021 Kevin Duarte, Brian Chen, Nina Shvetsova, Andrew Rouditchenko, Samuel Thomas, Alexander Liu, David Harwath, James Glass, Hilde Kuehne, Mubarak Shah

We present a new multimodal capsule network that allows us to leverage the strength of capsules in the context of a multimodal learning framework on large amounts of video data.

"Knights": First Place Submission for VIPriors21 Action Recognition Challenge at ICCV 2021

no code implementations14 Oct 2021 Ishan Dave, Naman Biyani, Brandon Clark, Rohit Gupta, Yogesh Rawat, Mubarak Shah

This technical report presents our approach "Knights" to solve the action recognition task on a small subset of Kinetics-400 i. e. Kinetics400ViPriors without using any extra-data.

Action Recognition Optical Flow Estimation

Discriminative Region-based Multi-Label Zero-Shot Learning

1 code implementation ICCV 2021 Sanath Narayan, Akshita Gupta, Salman Khan, Fahad Shahbaz Khan, Ling Shao, Mubarak Shah

We note that the best existing multi-label ZSL method takes a shared approach towards attending to region features with a common set of attention maps for all the classes.

Image Retrieval Multi-label zero-shot learning

Advances in adversarial attacks and defenses in computer vision: A survey

no code implementations1 Aug 2021 Naveed Akhtar, Ajmal Mian, Navid Kardan, Mubarak Shah

In [2], we reviewed the contributions made by the computer vision community in adversarial attacks on deep learning (and their defenses) until the advent of year 2018.

Video Generation from Text Employing Latent Path Construction for Temporal Modeling

no code implementations29 Jul 2021 Amir Mazaheri, Mubarak Shah

To the best of our knowledge, this is the very first work on the text (free-form sentences) to video generation on more realistic video datasets like Actor and Action Dataset (A2D) or UCF101.

Text-to-Video Generation Video Generation

TinyAction Challenge: Recognizing Real-world Low-resolution Activities in Videos

1 code implementation24 Jul 2021 Praveen Tirupattur, Aayush J Rana, Tushar Sangam, Shruti Vyas, Yogesh S Rawat, Mubarak Shah

While various approaches have been shown effective for recognition task in recent works, they often do not deal with videos of lower resolution where the action is happening in a tiny region.

Action Recognition

Controlled Caption Generation for Images Through Adversarial Attacks

no code implementations7 Jul 2021 Nayyer Aafaq, Naveed Akhtar, Wei Liu, Mubarak Shah, Ajmal Mian

In contrast, we propose a GAN-based algorithm for crafting adversarial examples for neural image captioning that mimics the internal representation of the CNN such that the resulting deep features of the input image enable a controlled incorrect caption generation through the recurrent network.

Image Captioning Language Modelling

Florida Wildlife Camera Trap Dataset

no code implementations23 Jun 2021 Crystal Gagne, Jyoti Kini, Daniel Smith, Mubarak Shah

Trail camera imagery has increasingly gained popularity amongst biologists for conservation and ecological research.

Image Classification

Out-of-Distribution Detection Using Union of 1-Dimensional Subspaces

2 code implementations CVPR 2021 Alireza Zaeemzadeh, Niccolo Bisagno, Zeno Sambugaro, Nicola Conci, Nazanin Rahnavard, Mubarak Shah

In this paper, we argue that OOD samples can be detected more easily if the training data is embedded into a low-dimensional space, such that the embedded training samples lie on a union of 1-dimensional subspaces.

Bayesian Inference Out-of-Distribution Detection +1

Novel View Video Prediction Using a Dual Representation

no code implementations7 Jun 2021 Sarah Shiraz, Krishna Regmi, Shruti Vyas, Yogesh S. Rawat, Mubarak Shah

We address the problem of novel view video prediction; given a set of input video clips from a single/multiple views, our network is able to predict the video from a novel view.

SSIM Video Prediction

PLM: Partial Label Masking for Imbalanced Multi-label Classification

no code implementations22 May 2021 Kevin Duarte, Yogesh S. Rawat, Mubarak Shah

By stochastically masking labels during loss computation, the method balances this ratio for each class, leading to improved recall on minority classes and improved precision on frequent classes.

Image Classification Multi-Label Classification

MutualNet: Adaptive ConvNet via Mutual Learning from Different Model Configurations

1 code implementation14 May 2021 Taojiannan Yang, Sijie Zhu, Matias Mendieta, Pu Wang, Ravikumar Balakrishnan, Minwoo Lee, Tao Han, Mubarak Shah, Chen Chen

MutualNet is a general training methodology that can be applied to various network structures (e. g., 2D networks: MobileNets, ResNet, 3D networks: SlowFast, X3D) and various tasks (e. g., image classification, object detection, segmentation, and action recognition), and is demonstrated to achieve consistent improvements on a variety of datasets.

Action Recognition Image Classification +1

Found a Reason for me? Weakly-supervised Grounded Visual Question Answering using Capsules

1 code implementation CVPR 2021 Aisha Urooj Khan, Hilde Kuehne, Kevin Duarte, Chuang Gan, Niels Lobo, Mubarak Shah

In this paper, we focus on a more relaxed setting: the grounding of relevant visual entities in a weakly supervised manner by training on the VQA task alone.

Question Answering Visual Question Answering

Handwriting Transformers

1 code implementation ICCV 2021 Ankan Kumar Bhunia, Salman Khan, Hisham Cholakkal, Rao Muhammad Anwer, Fahad Shahbaz Khan, Mubarak Shah

We propose a novel transformer-based styled handwritten text image generation approach, HWT, that strives to learn both style-content entanglement as well as global and local writing style patterns.

Image Generation Text Generation

Dogfight: Detecting Drones from Drones Videos

1 code implementation CVPR 2021 Muhammad Waseem Ashraf, Waqas Sultani, Mubarak Shah

The erratic movement of the source and target drones, small size, arbitrary shape, large intensity variations, and occlusion make this problem quite challenging.

Region Proposal

LSDAT: Low-Rank and Sparse Decomposition for Decision-based Adversarial Attack

no code implementations19 Mar 2021 Ashkan Esmaeili, Marzieh Edraki, Nazanin Rahnavard, Mubarak Shah, Ajmal Mian

It is set forth that the proposed sparse perturbation is the most aligned sparse perturbation with the shortest path from the input sample to the decision boundary for some initial adversarial sample (the best sparse approximation of shortest path, likely to fool the model).

Adversarial Attack Dimensionality Reduction

Modeling Multi-Label Action Dependencies for Temporal Action Localization

1 code implementation CVPR 2021 Praveen Tirupattur, Kevin Duarte, Yogesh Rawat, Mubarak Shah

We propose to improve action localization performance by modeling these action dependencies in a novel attention-based Multi-Label Action Dependency (MLAD)layer.

Multi-Label Classification Temporal Action Localization

TCLR: Temporal Contrastive Learning for Video Representation

no code implementations20 Jan 2021 Ishan Dave, Rohit Gupta, Mamshad Nayeem Rizve, Mubarak Shah

Our proposed temporal contrastive learning framework achieves significant improvement over the state-of-the-art results in various downstream video understanding tasks such as action recognition, limited-label action classification, and nearest-neighbor video retrieval on multiple video datasets and backbones.

Action Classification Contrastive Learning +7

Transformers in Vision: A Survey

no code implementations4 Jan 2021 Salman Khan, Muzammal Naseer, Munawar Hayat, Syed Waqas Zamir, Fahad Shahbaz Khan, Mubarak Shah

Astounding results from Transformer models on natural language tasks have intrigued the vision community to study their application to computer vision problems.

Action Recognition Colorization +9

Face Image Retrieval With Attribute Manipulation

no code implementations ICCV 2021 Alireza Zaeemzadeh, Shabnam Ghadar, Baldo Faieta, Zhe Lin, Nazanin Rahnavard, Mubarak Shah, Ratheesh Kalarot

For example, a user can ask for retrieving images similar to a query image, but with a different hair color, and no preference for absence/presence of eyeglasses in the results.

Face Image Retrieval

Video Geo-Localization Employing Geo-Temporal Feature Learning and GPS Trajectory Smoothing

1 code implementation ICCV 2021 Krishna Regmi, Mubarak Shah

In this paper, we address the problem of video geo-localization by proposing a Geo-Temporal Feature Learning (GTFL) Network to simultaneously learn the discriminative features between the query videos and gallery images for estimating the geo-spatial trajectory of a query video.

Asymptotic Optimality of Self-Representative Low-Rank Approximation and Its Applications

no code implementations1 Jan 2021 Saeed Vahidian, Mohsen Joneidi, Ashkan Esmaeili, Siavash Khodadadeh, Sharare Zehtabian, Ladislau Boloni, Nazanin Rahnavard, Bill Lin, Mubarak Shah

The approach is based on the concept of {\em self-rank}, defined as the minimum number of samples needed to reconstruct all samples with an accuracy proportional to the rank-$K$ approximation.

Correct block-design experiments mitigate temporal correlation bias in EEG classification

1 code implementation25 Nov 2020 Simone Palazzo, Concetto Spampinato, Joseph Schmidt, Isaak Kavasidis, Daniela Giordano, Mubarak Shah

We argue that the reason why Li et al. [1] observe such high correlation in EEG data is their unconventional experimental design and settings that violate the basic cognitive neuroscience design recommendations, first and foremost the one of limiting the experiments' duration, as instead done in [2].

EEG Experimental Design +1

Anomaly Detection in Video via Self-Supervised and Multi-Task Learning

1 code implementation CVPR 2021 Mariana-Iuliana Georgescu, Antonio Barbalau, Radu Tudor Ionescu, Fahad Shahbaz Khan, Marius Popescu, Mubarak Shah

To the best of our knowledge, we are the first to approach anomalous event detection in video as a multi-task learning problem, integrating multiple self-supervised and knowledge distillation proxy tasks in a single architecture.

Anomaly Detection Event Detection +3

MMFT-BERT: Multimodal Fusion Transformer with BERT Encodings for Visual Question Answering

1 code implementation Findings of the Association for Computational Linguistics 2020 Aisha Urooj Khan, Amir Mazaheri, Niels da Vitoria Lobo, Mubarak Shah

We present MMFT-BERT(MultiModal Fusion Transformer with BERT encodings), to solve Visual Question Answering (VQA) ensuring individual and combined processing of multiple input modalities.

Question Answering Visual Question Answering

Meta-learning the Learning Trends Shared Across Tasks

no code implementations19 Oct 2020 Jathushan Rajasegaran, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Mubarak Shah

This demonstrates their ability to acquire transferable knowledge, a capability that is central to human learning.

Meta-Learning

Uncertainty Estimation and Sample Selection for Crowd Counting

1 code implementation30 Sep 2020 Viresh Ranjan, Boyu Wang, Mubarak Shah, Minh Hoai

We present sample selection strategies which make use of the density and uncertainty of predictions from the networks trained on one domain to select the informative images from a target domain of interest to acquire human annotation.

Crowd Counting

A Background-Agnostic Framework with Adversarial Training for Abnormal Event Detection in Video

1 code implementation27 Aug 2020 Mariana-Iuliana Georgescu, Radu Tudor Ionescu, Fahad Shahbaz Khan, Marius Popescu, Mubarak Shah

Following the standard formulation of abnormal event detection as outlier detection, we propose a background-agnostic framework that learns from training videos containing only normal events.

Abnormal Event Detection In Video Event Detection +1

Simultaneous Detection and Tracking with Motion Modelling for Multiple Object Tracking

3 code implementations ECCV 2020 Shi-Jie Sun, Naveed Akhtar, Xiang-Yu Song, HuanSheng Song, Ajmal Mian, Mubarak Shah

Deep learning-based Multiple Object Tracking (MOT) currently relies on off-the-shelf detectors for tracking-by-detection. This results in deep models that are detector biased and evaluations that are detector influenced.

Multiple Object Tracking

Deep Photo Cropper and Enhancer

no code implementations3 Aug 2020 Aaron Ott, Amir Mazaheri, Niels D. Lobo, Mubarak Shah

In the photo enhancer, we employ super-resolution to increase the number of pixels in the embedded image and reduce the effect of stretching and distortion of pixels.

Image Enhancement Super-Resolution

Odyssey: Creation, Analysis and Detection of Trojan Models

1 code implementation16 Jul 2020 Marzieh Edraki, Nazmul Karim, Nazanin Rahnavard, Ajmal Mian, Mubarak Shah

We propose a detector that is based on the analysis of the intrinsic DNN properties; that are affected due to the Trojaning process.

Data Poisoning

TinyVIRAT: Low-resolution Video Action Recognition

1 code implementation14 Jul 2020 Ugur Demir, Yogesh S Rawat, Mubarak Shah

In real-world surveillance environments, the actions in videos are captured at a wide range of resolutions.

Action Recognition

Self-supervised Knowledge Distillation for Few-shot Learning

1 code implementation17 Jun 2020 Jathushan Rajasegaran, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Mubarak Shah

Our experiments show that, even in the first stage, self-supervision can outperform current state-of-the-art methods, with further gains achieved by our second stage distillation process.

Few-Shot Image Classification Knowledge Distillation +1

Gabriella: An Online System for Real-Time Activity Detection in Untrimmed Security Videos

no code implementations23 Apr 2020 Mamshad Nayeem Rizve, Ugur Demir, Praveen Tirupattur, Aayush Jung Rana, Kevin Duarte, Ishan Dave, Yogesh Singh Rawat, Mubarak Shah

For tubelet extraction, we propose a localization network which takes a video clip as input and spatio-temporally detects potential foreground regions at multiple scales to generate action tubelets.

Action Detection Activity Detection

RescueNet: Joint Building Segmentation and Damage Assessment from Satellite Imagery

no code implementations15 Apr 2020 Rohit Gupta, Mubarak Shah

Accurate and fine-grained information about the extent of damage to buildings is essential for directing Humanitarian Aid and Disaster Response (HADR) operations in the immediate aftermath of any natural calamity.

Disaster Response General Classification +2

Adversarial Learning for Personalized Tag Recommendation

1 code implementation1 Apr 2020 Erik Quintanilla, Yogesh Rawat, Andrey Sakryukin, Mubarak Shah, Mohan Kankanhalli

We demonstrate the effectiveness of the proposed model on two different large-scale and publicly available datasets, YFCC100M and NUS-WIDE.

General Classification Image Classification +2

iTAML: An Incremental Task-Agnostic Meta-learning Approach

1 code implementation CVPR 2020 Jathushan Rajasegaran, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Mubarak Shah

In this paper, we hypothesize this problem can be avoided by learning a set of generalized parameters, that are neither specific to old nor new tasks.

Incremental Learning Meta-Learning

Subspace Capsule Network

1 code implementation7 Feb 2020 Marzieh Edraki, Nazanin Rahnavard, Mubarak Shah

In this paper, we propose the SubSpace Capsule Network (SCN) that exploits the idea of capsule networks to model possible variations in the appearance or implicitly defined properties of an entity through a group of capsule subspaces instead of simply grouping neurons to create capsules.

General Classification Image Generation +1

Human Action Recognition in Drone Videos using a Few Aerial Training Examples

no code implementations22 Oct 2019 Waqas Sultani, Mubarak Shah

However, using deep neural networks for automatic aerial action recognition is difficult due to the need for a large number of training aerial human action videos.

Action Classification Action Recognition

Deep Constrained Dominant Sets for Person Re-identification

1 code implementation ICCV 2019 Leulseged Tesfaye Alemu, Marcello Pelillo, Mubarak Shah

By optimizing the constrained clustering in an end-to-end manner, we naturally leverage the contextual knowledge of a set of images corresponding to the given person-images.

Ranked #2 on Person Re-Identification on CUHK03 (Rank-5 metric)

Image Retrieval Person Re-Identification

Bridging the Domain Gap for Ground-to-Aerial Image Matching

1 code implementation ICCV 2019 Krishna Regmi, Mubarak Shah

Our Feature Fusion method combines the complementary features from a synthesized aerial image with the corresponding ground features to obtain a robust query representation.

Crowd Transformer Network

no code implementations4 Apr 2019 Viresh Ranjan, Mubarak Shah, Minh Hoai Nguyen

Most of the existing crowd counting approaches rely on local features for estimating the crowd density map.

Crowd Counting Density Estimation

Iterative Projection and Matching: Finding Structure-preserving Representatives and Its Application to Computer Vision

2 code implementations CVPR 2019 Mohsen Joneidi, Alireza Zaeemzadeh, Nazanin Rahnavard, Mubarak Shah

In our algorithm, at each iteration, the maximum information from the structure of the data is captured by one selected sample, and the captured information is neglected in the next iterations by projection on the null-space of previously selected samples.

Action Recognition Active Learning +3

Time-Aware and View-Aware Video Rendering for Unsupervised Representation Learning

no code implementations26 Nov 2018 Shruti Vyas, Yogesh S Rawat, Mubarak Shah

We demonstrate the effectiveness of the proposed method in rendering view-aware as well as time-aware video clips on two different real-world datasets including UCF-101 and NTU-RGB+D.

Representation Learning

Deep Affinity Network for Multiple Object Tracking

2 code implementations28 Oct 2018 Shi-Jie Sun, Naveed Akhtar, HuanSheng Song, Ajmal Mian, Mubarak Shah

In this paper, we harness the power of deep learning for data association in tracking by jointly modelling object appearances and their affinities between different frames in an end-to-end fashion.

Multiple Object Tracking Object Detection

Decoding Brain Representations by Multimodal Learning of Neural Activity and Visual Features

no code implementations25 Oct 2018 Simone Palazzo, Concetto Spampinato, Isaak Kavasidis, Daniela Giordano, Joseph Schmidt, Mubarak Shah

After verifying that visual information can be extracted from EEG data, we introduce a multimodal approach that uses deep image and EEG encoders, trained in a siamese configuration, for learning a joint manifold that maximizes a compatibility measure between visual features and brain representations.

EEG General Classification +2

Pay attention! - Robustifying a Deep Visuomotor Policy through Task-Focused Attention

no code implementations26 Sep 2018 Pooya Abolghasemi, Amir Mazaheri, Mubarak Shah, Ladislau Bölöni

In this paper, we propose an approach for augmenting a deep visuomotor policy trained through demonstrations with Task Focused visual Attention (TFA).

Composition Loss for Counting, Density Map Estimation and Localization in Dense Crowds

no code implementations ECCV 2018 Haroon Idrees, Muhmmad Tayyab, Kishan Athrey, Dong Zhang, Somaya Al-Maadeed, Nasir Rajpoot, Mubarak Shah

With multiple crowd gatherings of millions of people every year in events ranging from pilgrimages to protests, concerts to marathons, and festivals to funerals; visual crowd analysis is emerging as a new frontier in computer vision.

Crowd Counting Visual Crowd Analysis

Training Faster by Separating Modes of Variation in Batch-normalized Models

no code implementations7 Jun 2018 Mahdi M. Kalayeh, Mubarak Shah

We show that assuming samples within a mini-batch are from the same probability density function, then BN is identical to the Fisher vector of a Gaussian distribution.

Image Classification

Task-Agnostic Meta-Learning for Few-shot Learning

no code implementations20 May 2018 Muhammad Abdullah Jamal, Guo-Jun Qi, Mubarak Shah

Meta-learning approaches have been proposed to tackle the few-shot learning problem. Typically, a meta-learner is trained on a variety of tasks in the hopes of being generalizable to new tasks.

Few-Shot Learning General Classification

Norm-Preservation: Why Residual Networks Can Become Extremely Deep?

1 code implementation18 May 2018 Alireza Zaeemzadeh, Nazanin Rahnavard, Mubarak Shah

We prove that the skip connections in the residual blocks facilitate preserving the norm of the gradient, and lead to stable back-propagation, which is desirable from optimization perspective.

Human Semantic Parsing for Person Re-identification

no code implementations CVPR 2018 Mahdi M. Kalayeh, Emrah Basaran, Muhittin Gokmen, Mustafa E. Kamasak, Mubarak Shah

In this paper, we propose to adopt human semantic parsing which, due to its pixel-level accuracy and capability of modeling arbitrary contours, is naturally a better alternative.

Person Re-Identification Representation Learning +1

Real-world Anomaly Detection in Surveillance Videos

6 code implementations CVPR 2018 Waqas Sultani, Chen Chen, Mubarak Shah

To avoid annotating the anomalous segments or clips in training videos, which is very time consuming, we propose to learn anomaly through the deep multiple instance ranking framework by leveraging weakly labeled training videos, i. e. the training labels (anomalous or normal) are at video-level instead of clip-level.

Activity Recognition Anomaly Detection In Surveillance Videos +2

Visual Text Correction

1 code implementation ECCV 2018 Amir Mazaheri, Mubarak Shah

A semantic inconsistency between the sentence and the video or between the words of a sentence can result in an inaccurate description.

Grammatical Error Correction Visual Text Correction

An End-to-end 3D Convolutional Neural Network for Action Detection and Segmentation in Videos

no code implementations30 Nov 2017 Rui Hou, Chen Chen, Mubarak Shah

A video is first divided into equal length clips and next for each clip a set of tube proposals are generated based on 3D CNN features.

Action Detection Action Segmentation +3

Generative Adversarial Networks Conditioned by Brain Signals

no code implementations ICCV 2017 Simone Palazzo, Concetto Spampinato, Isaak Kavasidis, Daniela Giordano, Mubarak Shah

In this work, we build on the latter class of approaches and investigate the possibility of driving and conditioning the image generation process by means of brain signals recorded, through an electroencephalograph (EEG), while users look at images from a set of 40 ImageNet object categories with the objective of generating the seen images.

EEG Image Generation

Unsupervised Action Discovery and Localization in Videos

no code implementations ICCV 2017 Khurram Soomro, Mubarak Shah

Once classes are discovered, training videos within each cluster are selected to perform automatic spatio-temporal annotations, by first oversegmenting videos in each discovered class into supervoxels and constructing a directed graph to apply a variant of knapsack problem with temporal constraints.

Action Localization

Multi-Target Tracking in Multiple Non-Overlapping Cameras using Constrained Dominant Sets

no code implementations19 Jun 2017 Yonatan Tariku Tesfaye, Eyasu Zemene, Andrea Prati, Marcello Pelillo, Mubarak Shah

In this paper, a unified three-layer hierarchical approach for solving tracking problems in multiple non-overlapping cameras is proposed.

Video Fill In the Blank using LR/RL LSTMs with Spatial-Temporal Attentions

1 code implementation ICCV 2017 Amir Mazaheri, Dong Zhang, Mubarak Shah

Since the source sentence is broken into two fragments: the sentence's left fragment (before the blank) and the sentence's right fragment (after the blank), traditional Recurrent Neural Networks cannot encode this structure accurately because of many possible variations of the missing word in terms of the location and type of the word in the source sentence.

ClusterNet: Detecting Small Objects in Large Scenes by Exploiting Spatio-Temporal Information

no code implementations CVPR 2018 Rodney LaLonde, Dong Zhang, Mubarak Shah

To reduce the large search space, the first stage (ClusterNet) takes in a set of extremely large video frames, combines the motion and appearance information within the convolutional architecture, and proposes regions of objects of interest (ROOBI).

Object Detection

Unsupervised Action Proposal Ranking through Proposal Recombination

no code implementations3 Apr 2017 Waqas Sultani, Dong Zhang, Mubarak Shah

Given the action proposals in a video, the goal of the proposed work is to generate a few better action proposals that are ranked properly.

Action Detection Action Recognition

Tube Convolutional Neural Network (T-CNN) for Action Detection in Videos

1 code implementation ICCV 2017 Rui Hou, Chen Chen, Mubarak Shah

A video is first divided into equal length clips and for each clip a set of tube proposals are generated next based on 3D Convolutional Network (ConvNet) features.

Action Detection Image Classification +1

Cross-View Image Matching for Geo-localization in Urban Environments

1 code implementation CVPR 2017 Yicong Tian, Chen Chen, Mubarak Shah

Next, for each building in the query image, we retrieve the $k$ nearest neighbors from the reference buildings using a Siamese network trained on both positive matching image pairs and negative pairs.

Cross-View Image-to-Image Translation Image Classification +1

Re-identification of Humans in Crowds using Personal, Social and Environmental Constraints

no code implementations7 Dec 2016 Shayan Modiri Assari, Haroon Idrees, Mubarak Shah

This paper addresses the problem of human re-identification across non-overlapping cameras in crowds. Re-identification in crowded scenes is a challenging problem due to large number of people and frequent occlusions, coupled with changes in their appearance due to different properties and exposure of cameras.

Person Re-Identification

Online Localization and Prediction of Actions and Interactions

no code implementations4 Dec 2016 Khurram Soomro, Haroon Idrees, Mubarak Shah

For online prediction of action (interaction) confidences, we propose an approach based on Structural SVM that operates on short video segments, and is trained with the objective that confidence of an action or interaction increases as time progresses.

Pose Estimation Superpixels

On Duality Of Multiple Target Tracking and Segmentation

no code implementations14 Oct 2016 Yicong Tian, Mubarak Shah

For segmentation, multi-label Conditional Random Field (CRF) is applied to a superpixel based spatio-temporal graph in a segment of video to assign background or target labels to every superpixel.

Object Tracking Occlusion Handling +1

Video Fill in the Blank with Merging LSTMs

no code implementations13 Oct 2016 Amir Mazaheri, Dong Zhang, Mubarak Shah

In the experiments, we have demonstrated the superior performance of the proposed method on the challenging "Movie Fill-in-the-Blank" dataset.

Deep Learning Human Mind for Automated Visual Classification

2 code implementations CVPR 2017 Concetto Spampinato, Simone Palazzo, Isaak Kavasidis, Daniela Giordano, Mubarak Shah, Nasim Souly

In particular, we employ EEG data evoked by visual object stimuli combined with Recurrent Neural Networks (RNN) to learn a discriminative brain activity manifold of visual categories.

EEG General Classification

Scene Labeling Through Knowledge-Based Rules Employing Constrained Integer Linear Programing

no code implementations17 Aug 2016 Nasim Souly, Mubarak Shah

In this paper, we propose to use high-level knowledge regarding rules in the inference to incorporate dependencies among regions in the image to improve scores of classification.

Scene Labeling

Query-Focused Extractive Video Summarization

no code implementations18 Jul 2016 Aidean Sharghi, Boqing Gong, Mubarak Shah

The decision to include a shot in the summary depends on the shot's relevance to the user query and importance in the context of the video, jointly.

Video Summarization

Covariance of Motion and Appearance Featuresfor Spatio Temporal Recognition Tasks

no code implementations16 Jun 2016 Subhabrata Bhattacharya, Nasim Souly, Mubarak Shah

Using an over-complete dictionary of the covariance based descriptors built from labeled training samples, we formulate low-level event recognition as a sparse linear approximation problem.

Gesture Recognition One-Shot Learning +1

Predicting the Where and What of Actors and Actions Through Online Action Localization

no code implementations CVPR 2016 Khurram Soomro, Haroon Idrees, Mubarak Shah

This paper proposes a novel approach to tackle the challenging problem of 'online action localization' which entails predicting actions and their locations as they happen in a video.

Action Localization Superpixels

What If We Do Not Have Multiple Videos of the Same Action? -- Video Action Localization Using Web Images

no code implementations CVPR 2016 Waqas Sultani, Mubarak Shah

%We reconstruct video action proposals from image action proposals while enforcing consistency across coefficient vectors of multiple frames by consensus regularization.

Optical Flow Estimation Spatio-Temporal Action Localization +1

Scene Labeling Using Sparse Precision Matrix

no code implementations CVPR 2016 Nasim Souly, Mubarak Shah

To do this, we formulate the problem as an energy minimization over a graph, whose structure is captured by applying sparse constraint on the elements of the precision matrix.

Scene Labeling

Fast Zero-Shot Image Tagging

no code implementations CVPR 2016 Yang Zhang, Boqing Gong, Mubarak Shah

The well-known word analogy experiments show that the recent word vectors capture fine-grained linguistic regularities in words by linear vector offsets, but it is unclear how well the simple vector offsets can encode visual regularities over words.

Multi-label zero-shot learning

Automatic Action Annotation in Weakly Labeled Videos

no code implementations26 May 2016 Waqas Sultani, Mubarak Shah

The output of our method is the most action representative proposals from each video.

Optical Flow Estimation

A Framework for Human Pose Estimation in Videos

no code implementations26 Apr 2016 Dong Zhang, Mubarak Shah

A sequence of the best poses is inferred from the abstract body part tracklets through the tree-based optimization.

Pose Estimation

The THUMOS Challenge on Action Recognition for Videos "in the Wild"

no code implementations21 Apr 2016 Haroon Idrees, Amir R. Zamir, Yu-Gang Jiang, Alex Gorban, Ivan Laptev, Rahul Sukthankar, Mubarak Shah

Additionally, we include a comprehensive empirical study evaluating the differences in action recognition between trimmed and untrimmed videos, and how well methods trained on trimmed videos generalize to untrimmed videos.

Action Classification Action Recognition +2

Binary Quadratic Programing for Online Tracking of Hundreds of People in Extremely Crowded Scenes

no code implementations30 Mar 2016 Afshin Dehghan, Mubarak Shah

In this paper, we propose a tracker that addresses the aforementioned problems and is capable of tracking hundreds of people efficiently.

Multi-Object Tracking

Autonomous navigation for low-altitude UAVs in urban areas

no code implementations25 Feb 2016 Thomas Castelli, Aidean Sharghi, Don Harper, Alain Tremeau, Mubarak Shah

In recent years, consumer Unmanned Aerial Vehicles have become very popular, everyone can buy and fly a drone without previous experience, which raises concern in regards to regulations and public safety.

Autonomous Navigation

Learning a Deep Model for Human Action Recognition from Novel Viewpoints

no code implementations2 Feb 2016 Hossein Rahmani, Ajmal Mian, Mubarak Shah

The strength of our technique is that we learn a single R-NKTM for all actions and all viewpoints for knowledge transfer of any real human action video without the need for re-training or fine-tuning the model.

Action Recognition Transfer Learning

Human Pose Estimation in Videos

no code implementations ICCV 2015 Dong Zhang, Mubarak Shah

Using the idea of `Association', the optimal tracklets are generated for each abstract body part, in order to enforce the spatiotemporal constraints between body parts in adjacent frames.

Pose Estimation

Action Localization in Videos Through Context Walk

no code implementations ICCV 2015 Khurram Soomro, Haroon Idrees, Mubarak Shah

Context relations are learned during training which capture displacements from all the supervoxels in a video to those belonging to foreground actions.

Action Localization

Target Identity-Aware Network Flow for Online Multiple Target Tracking

no code implementations CVPR 2015 Afshin Dehghan, Yicong Tian, Philip H. S. Torr, Mubarak Shah

In this paper we show that multiple object tracking (MOT) can be formulated in a framework, where the detection and data-association are performed simultaneously.

Multiple Object Tracking Object Detection

Geo-Semantic Segmentation

no code implementations CVPR 2015 Shervin Ardeshir, Kofi Malcolm Collins-Sibley, Mubarak Shah

In this paper, we propose a method which leverages information acquired from GIS databases to perform semantic segmentation of the image alongside with geo-referencing each semantic segment with its address and geo-location.

Semantic Segmentation

Understanding Trajectory Behavior: A Motion Pattern Approach

no code implementations4 Jan 2015 Mahdi M. Kalayeh, Stephen Mussmann, Alla Petrakova, Niels da Vitoria Lobo, Mubarak Shah

In the second phase, via a Kmeans clustering approach, we create motion components by clustering the flow vectors with respect to their location and velocity.

Improving Semantic Concept Detection through the Dictionary of Visually-distinct Elements

no code implementations CVPR 2014 Afshin Dehghan, Haroon Idrees, Mubarak Shah

A video captures a sequence and interactions of concepts that can be static, for instance, objects or scenes, or dynamic, such as actions.

GPS-Tag Refinement using Random Walks with an Adaptive Damping Factor

no code implementations CVPR 2014 Amir Roshan Zamir, Shervin Ardeshir, Mubarak Shah

We develop a robust method for identification and refinement of this subset using the rest of the images in the dataset.

TAG

Who Do I Look Like? Determining Parent-Offspring Resemblance via Gated Autoencoders

no code implementations CVPR 2014 Afshin Dehghan, Enrique. G. Ortiz, Ruben Villegas, Mubarak Shah

Recent years have seen a major push for face recognition technology due to the large expansion of image sharing on social networks.

Face Recognition

Recognition of Complex Events: Exploiting Temporal Dynamics between Underlying Concepts

no code implementations CVPR 2014 Subhabrata Bhattacharya, Mahdi M. Kalayeh, Rahul Sukthankar, Mubarak Shah

While approaches based on bags of features excel at low-level action classification, they are ill-suited for recognizing complex events in video, where concept-based temporal representations currently dominate.

Action Classification Event Detection +2

Face Verification Using Boosted Cross-Image Features

no code implementations28 Sep 2013 Dong Zhang, Omar Oreifej, Mubarak Shah

In contrast, we propose to extract cross-image features, i. e. features across the pair of images, which, as we demonstrate, is more discriminative to the similarity and the dissimilarity of faces.

Face Detection Face Recognition +1

Improving an Object Detector and Extracting Regions Using Superpixels

no code implementations CVPR 2013 Guang Shu, Afshin Dehghan, Mubarak Shah

In general, our method takes detection bounding boxes of a generic detector as input and generates the detection output with higher average precision and precise object regions.

Superpixels

Spatiotemporal Deformable Part Models for Action Detection

no code implementations CVPR 2013 Yicong Tian, Rahul Sukthankar, Mubarak Shah

Deformable part models have achieved impressive performance for object detection, even on difficult image datasets.

Action Detection Object Detection

Face Recognition in Movie Trailers via Mean Sequence Sparse Representation-Based Classification

no code implementations CVPR 2013 Enrique. G. Ortiz, Alan Wright, Mubarak Shah

A straightforward application of the popular n-minimization for face recognition on a frame-by-frame basis is prohibitively expensive, so we propose a novel algorithm Mean Sequence SRC (MSSRC) that performs video face recognition using a joint optimization leveraging all of the available video data and the knowledge that the face track frames belong to the same individual.

Face Recognition General Classification +1

Multi-source Multi-scale Counting in Extremely Dense Crowd Images

no code implementations CVPR 2013 Haroon Idrees, Imran Saleemi, Cody Seibert, Mubarak Shah

Instead, our approach relies on multiple sources such as low confidence head detections, repetition of texture elements (using SIFT), and frequency-domain analysis to estimate counts, along with confidence associated with observing individuals, in an image region.

Crowd Counting Human Detection

Video Object Segmentation through Spatially Accurate and Temporally Dense Extraction of Primary Object Regions

no code implementations CVPR 2013 Dong Zhang, Omar Javed, Mubarak Shah

The proposed approach has several contributions: First, a novel layered Directed Acyclic Graph (DAG) based framework is presented for detection and segmentation of the primary object in video.

Optical Flow Estimation Semantic Segmentation +3

Semi-supervised Learning of Feature Hierarchies for Object Detection in a Video

no code implementations CVPR 2013 Yang Yang, Guang Shu, Mubarak Shah

In order to learn discriminative and compact features, we propose a new feature learning method using a deep neural network based on auto encoders.

Object Detection

UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild

7 code implementations3 Dec 2012 Khurram Soomro, Amir Roshan Zamir, Mubarak Shah

To the best of our knowledge, UCF101 is currently the most challenging dataset of actions due to its large number of classes, large number of clips and also unconstrained nature of such clips.

Action Recognition Action Recognition In Videos +1

Cannot find the paper you are looking for? You can Submit a new open access paper.