no code implementations • ECCV 2020 • Jin Xie, Hisham Cholakkal, Rao Muhammad Anwer, Fahad Shahbaz Khan, Yanwei Pang, Ling Shao, Mubarak Shah
We further introduce a count-and-similarity branch within the two-stage detection framework, which predicts pedestrian count as well as proposal similarity.
1 code implementation • ECCV 2020 • Shruti Vyas, Yogesh S Rawat, Mubarak Shah
We evaluate the effectiveness of the learned representation for multi-view video action recognition in a supervised approach.
no code implementations • 29 Oct 2024 • Chen Chen, Daochang Liu, Mubarak Shah, Chang Xu
Furthermore, driven by our observation that local memorization significantly underperforms in existing tasks of measuring, detecting, and mitigating memorization in diffusion models compared to global memorization, we propose a simple yet effective method to integrate BE and the results of the new localization task into these existing frameworks.
no code implementations • 29 Oct 2024 • Chen Chen, Enhuai Liu, Daochang Liu, Mubarak Shah, Chang Xu
Diffusion models, widely used for image and video generation, face a significant limitation: the risk of memorizing and reproducing training data during inference, potentially generating unauthorized copyrighted content.
1 code implementation • 30 Sep 2024 • Weitai Kang, Haifeng Huang, Yuzhang Shang, Mubarak Shah, Yan Yan
RIG generates two key instruction data: 1) the Adversarial Instruction-following data, which features mixed negative and positive samples to enhance the model's discriminative understanding.
no code implementations • 2 Sep 2024 • Ishan Rajendrakumar Dave, Fabian Caba Heilbron, Mubarak Shah, Simon Jenni
Temporal video alignment aims to synchronize the key events like object interactions or action phase transitions in two videos.
no code implementations • 2 Sep 2024 • Ishan Rajendrakumar Dave, Mamshad Nayeem Rizve, Mubarak Shah
Since fine-grained actions are more challenging due to the absence of scene bias, classifying these actions requires an understanding of action-phases.
1 code implementation • 5 Aug 2024 • Manu S Pillai, Mamshad Nayeem Rizve, Mubarak Shah
We introduce GeoAdapter, a transformer-adapter module designed to efficiently aggregate image-level representations and adapt them for video inputs.
no code implementations • 18 Jul 2024 • Sirnam Swetha, Jinyu Yang, Tal Neiman, Mamshad Nayeem Rizve, Son Tran, Benjamin Yao, Trishul Chilimbi, Mubarak Shah
In this work, we focus on enhancing the visual representations for MLLMs by combining high-frequency and detailed visual representations, obtained through masked image modeling (MIM), with semantically-enriched low-frequency representations captured by CL.
no code implementations • 12 Jul 2024 • Rohit Gupta, Mamshad Nayeem Rizve, Jayakrishnan Unnikrishnan, Ashish Tawari, Son Tran, Mubarak Shah, Benjamin Yao, Trishul Chilimbi
Pre-trained vision-language models (VLMs) have enabled significant progress in open vocabulary computer vision tasks such as image classification, object detection and image segmentation.
1 code implementation • 5 Jul 2024 • Peiyu Yang, Naveed Akhtar, Mubarak Shah, Ajmal Mian
Trustworthy machine learning necessitates meticulous regulation of model reliance on non-robust features.
1 code implementation • 3 Jul 2024 • Weitai Kang, Gaowen Liu, Mubarak Shah, Yan Yan
Specifically, we propose the Multi-layer Multi-task Encoder-Decoder as the target grounding stage, where we learn a regression query and multiple segmentation queries to ground the target by regression and segmentation of the box in each decoding layer, respectively.
no code implementations • 2 Jul 2024 • Furqan Shaukat, Syed Muhammad Anwar, Abhijeet Parida, Van Khanh Lam, Marius George Linguraru, Mubarak Shah
Computer-aided diagnosis can help with early lung nodul detection and facilitate subsequent nodule characterization.
no code implementations • 19 Jun 2024 • Daochang Liu, Axel Hu, Mubarak Shah, Chang Xu
In this paper, we propose DiffTriplet, a new generative framework for surgical triplet recognition employing the diffusion model, which predicts surgical triplets via iterative denoising.
Ranked #1 on Action Triplet Recognition on CholecT45 (cross-val)
1 code implementation • 14 Jun 2024 • Anshuman Gaharwar, Parth Parag Kulkarni, Joshua Dickey, Mubarak Shah
To the best of our knowledge, this is the first transformer-based deep learning model for seismic waveform reconstruction.
no code implementations • 28 May 2024 • Weitai Kang, Mengxue Qu, Jyoti Kini, Yunchao Wei, Mubarak Shah, Yan Yan
To achieve detection based on human intention, it relies on humans to observe the scene, reason out the target that aligns with their intention ("pillow" in this case), and finally provide a reference to the AI system, such as "A pillow on the couch".
1 code implementation • 25 May 2024 • Junyi Wu, Haoxuan Wang, Yuzhang Shang, Mubarak Shah, Yan Yan
SSC extends this approach by dynamically adjusting the balanced salience to capture the temporal variations in activation.
no code implementations • 24 May 2024 • Zichen Geng, Caren Han, Zeeshan Hayder, Jian Liu, Mubarak Shah, Ajmal Mian
Text-driven human motion generation is an emerging task in animation and humanoid robot design.
no code implementations • 22 May 2024 • Florinel-Alin Croitoru, Vlad Hondru, Radu Tudor Ionescu, Nicu Sebe, Mubarak Shah
Our approach, Curriculum DPO, is compared against state-of-the-art fine-tuning approaches on three benchmarks, outperforming the competing methods in terms of text alignment, aesthetics and human preference.
1 code implementation • 12 May 2024 • Sushant Gautam, Mehdi Houshmand Sarkhoosh, Jan Held, Cise Midoglu, Anthony Cioppa, Silvio Giancola, Vajira Thambawita, Michael A. Riegler, Pål Halvorsen, Mubarak Shah
The application of Automatic Speech Recognition (ASR) technology in soccer offers numerous opportunities for sports analytics.
no code implementations • 10 Apr 2024 • Aakash Kumar, Chen Chen, Ajmal Mian, Neils Lobo, Mubarak Shah
Our method requires only a small number of 3D points, that can be obtained from a low-cost, low-resolution sensor.
1 code implementation • IEEE Transactions on Image Processing 2024 • Anas Zafar, Danyal Aftab, Rizwan Qureshi, Xinqi Fan, Pingjun Chen, Jia Wu, Hazrat Ali, Shah Nawaz, Sheheryar Khan, Mubarak Shah
In this paper, we propose a novel and computationally efficient architecture Single Stage Adaptive Multi-Attention Network (SSAMAN) for image restoration tasks, particularly for image denoising and image deblurring.
Ranked #2 on Image Denoising on DND
no code implementations • 3 Apr 2024 • Matteo Pennisi, Giovanni Bellitto, Simone Palazzo, Mubarak Shah, Concetto Spampinato
We present DiffExplainer, a novel framework that, leveraging language-vision models, enables multimodal global explainability.
1 code implementation • 28 Mar 2024 • Bo Miao, Mohammed Bennamoun, Yongsheng Gao, Mubarak Shah, Ajmal Mian
Referring Video Object Segmentation (R-VOS) methods face challenges in maintaining consistent object segmentation due to temporal context variability and the presence of other visually similar objects.
Ranked #2 on Referring Video Object Segmentation on MeViS
1 code implementation • CVPR 2024 • Omkar Thawakar, Muzammal Naseer, Rao Muhammad Anwer, Salman Khan, Michael Felsberg, Mubarak Shah, Fahad Shahbaz Khan
Composed video retrieval (CoVR) is a challenging problem in computer vision which has recently highlighted the integration of modification text with visual queries for more sophisticated video search in large databases.
no code implementations • CVPR 2024 • Mamshad Nayeem Rizve, Fan Fei, Jayakrishnan Unnikrishnan, Son Tran, Benjamin Z. Yao, Belinda Zeng, Mubarak Shah, Trishul Chilimbi
To effectively address this limitation, we instead keep the network architecture simple and use a set of data tokens that operate at different temporal resolutions in a hierarchical manner, accounting for the temporally hierarchical nature of videos.
1 code implementation • 21 Mar 2024 • Yuning Cui, Syed Waqas Zamir, Salman Khan, Alois Knoll, Mubarak Shah, Fahad Shahbaz Khan
Our approach is motivated by the observation that different degradation types impact the image content on different frequency subbands, thereby requiring different treatments for each restoration task.
no code implementations • 11 Mar 2024 • Rukhshanda Hussain, Hui Xian Grace Lim, BorChun Chen, Mubarak Shah, Ser Nam Lim
Second, we establish that the concept of a view can be disentangled and transferred to a novel object irrespective of the original object's identify from which the views are learnt.
no code implementations • 7 Mar 2024 • Prakash Chandra Chhipa, Meenakshi Subhash Chippa, Kanjar De, Rajkumar Saini, Marcus Liwicki, Mubarak Shah
In this work, we propose mitigating perspective distortion (MPD) by employing a fine-grained parameter control on a specific family of M\"obius transform to model real-world distortion without estimating camera intrinsic and extrinsic parameters and without the need for actual distorted data.
1 code implementation • 16 Feb 2024 • Ishan Rajendrakumar Dave, Tristan de Blegiers, Chen Chen, Mubarak Shah
On the publicly available large-scale M5-dataset, our proposed method shows a significant improvement of 16% over the state-of-the-art methods in terms of the mean average precision metric (mAP), provides 21x speed improvement during inference and requires only half of the learnable parameters used in prior methods.
no code implementations • 20 Dec 2023 • Ishan Rajendrakumar Dave, Simon Jenni, Mubarak Shah
To address these issues, we propose 1) a more challenging reformulation of temporal self-supervision as frame-level (rather than clip-level) recognition tasks and 2) an effective augmentation strategy to mitigate shortcuts.
1 code implementation • 10 Dec 2023 • Nyle Siddiqui, Praveen Tirupattur, Mubarak Shah
In this work, we present a novel approach to multi-view action recognition where we guide learned action representations to be separated from view-relevant information in a video.
Ranked #1 on Action Recognition on N-UCLA
no code implementations • CVPR 2024 • Aritra Dutta, Srijan Das, Jacob Nielsen, Rajatsubhra Chakraborty, Mubarak Shah
Despite the commercial abundance of UAVs, aerial data acquisition remains challenging, and the existing Asia and North America-centric open-source UAV datasets are small-scale or low-resolution and lack diversity in scene contextuality.
1 code implementation • 22 Nov 2023 • Shehan Munasinghe, Rusiru Thushara, Muhammad Maaz, Hanoona Abdul Rasheed, Salman Khan, Mubarak Shah, Fahad Khan
Extending image-based Large Multimodal Models (LMMs) to videos is challenging due to the inherent complexity of video data.
no code implementations • 23 Oct 2023 • Adeel Yousaf, Muzammal Naseer, Salman Khan, Fahad Shahbaz Khan, Mubarak Shah
Consistent improvements across multiple benchmarks and with various VLMs demonstrate the effectiveness of our proposed framework.
Ranked #4 on Video-Text Retrieval on Test-of-Time
1 code implementation • 25 Sep 2023 • Jyoti Kini, Sarah Fleischer, Ishan Dave, Mubarak Shah
Our work focuses on recognizing actions from egocentric RGB and Depth modalities in an industry-like environment.
no code implementations • 18 Sep 2023 • James Beetham, Navid Kardan, Ajmal Mian, Mubarak Shah
To this end, the two main challenges are estimating gradients of the target model without access to its parameters, and generating a diverse set of training samples that thoroughly explores the input space.
1 code implementation • ICCV 2023 • Sarinda Samarasinghe, Mamshad Nayeem Rizve, Navid Kardan, Mubarak Shah
To address this issue, in this work, we propose a novel cross-domain few-shot video action recognition method that leverages self-supervised learning and curriculum learning to balance the information from the source and target domains.
cross-domain few-shot learning Few-Shot action recognition +3
no code implementations • 25 Aug 2023 • Tristan de Blegiers, Ishan Rajendrakumar Dave, Adeel Yousaf, Mubarak Shah
Recognizing and comprehending human actions and gestures is a crucial perception requirement for robots to interact with humans and carry out tasks in diverse domains, including service robotics, healthcare, and manufacturing.
1 code implementation • ICCV 2023 • Swetha Sirnam, Mamshad Nayeem Rizve, Nina Shvetsova, Hilde Kuehne, Mubarak Shah
Self-supervised learning on large-scale multi-modal datasets allows learning semantically meaningful embeddings in a joint multi-modal representation space without relying on human annotations.
1 code implementation • ICCV 2023 • Joseph Fioresi, Ishan Rajendrakumar Dave, Mubarak Shah
In this paper, we propose TeD-SPAD, a privacy-aware video anomaly detection framework that destroys visual private information in a self-supervised manner.
1 code implementation • 10 Aug 2023 • Jyoti Kini, Sarah Fleischer, Ishan Dave, Mubarak Shah
In this work, we propose an ensemble modeling approach for multimodal action recognition.
1 code implementation • 2 Aug 2023 • Florinel-Alin Croitoru, Vlad Hondru, Radu Tudor Ionescu, Mubarak Shah
To this end, we study the task of predicting the prompt embedding given an image generated by a generative diffusion model.
1 code implementation • 25 Jul 2023 • Muhammad Awais, Muzammal Naseer, Salman Khan, Rao Muhammad Anwer, Hisham Cholakkal, Mubarak Shah, Ming-Hsuan Yang, Fahad Shahbaz Khan
Vision systems to see and reason about the compositional nature of visual scenes are fundamental to understanding our world.
2 code implementations • 14 Jul 2023 • Asif Hanif, Muzammal Naseer, Salman Khan, Mubarak Shah, Fahad Shahbaz Khan
While recent advances in deep learning have improved the performance of volumetric medical image segmentation models, these models cannot be deployed for real-world applications immediately due to their vulnerability to adversarial attacks.
2 code implementations • ICCV 2023 • Syed Talal Wasim, Muhammad Uzair Khattak, Muzammal Naseer, Salman Khan, Mubarak Shah, Fahad Shahbaz Khan
Video transformer designs are based on self-attention that can model global context at a high computational cost.
Ranked #1 on Action Recognition on Diving-48
1 code implementation • CVPR 2024 • Nicolae-Catalin Ristea, Florinel-Alin Croitoru, Radu Tudor Ionescu, Marius Popescu, Fahad Shahbaz Khan, Mubarak Shah
We propose an efficient abnormal event detection model based on a lightweight masked auto-encoder (AE) applied at the video frame level.
Ranked #12 on Anomaly Detection on UCSD Ped2
no code implementations • 15 Jun 2023 • Soumyabrata Dey, Ravishankar Rao, Mubarak Shah
The concatenation of the network features of all the voxels in a brain serves as the feature vector.
1 code implementation • CVPR 2023 • Aisha Urooj Khan, Hilde Kuehne, Bo Wu, Kim Chheu, Walid Bousselham, Chuang Gan, Niels Lobo, Mubarak Shah
The proposed method is trained in an end-to-end manner and optimized by a VQA loss with the cross-entropy function and a Hungarian matching loss for the situation graph prediction.
Ranked #6 on Video Question Answering on AGQA 2.0 balanced
no code implementations • 6 Apr 2023 • Sijie Zhu, Linjie Yang, Chen Chen, Mubarak Shah, Xiaohui Shen, Heng Wang
Visual Place Recognition (VPR) estimates the location of query images by matching them with images in a reference database.
1 code implementation • CVPR 2023 • Syed Talal Wasim, Muzammal Naseer, Salman Khan, Fahad Shahbaz Khan, Mubarak Shah
Through this prompting scheme, we can achieve state-of-the-art zero-shot performance on Kinetics-600, HMDB51 and UCF101 while remaining competitive in the supervised setting.
1 code implementation • 3 Apr 2023 • Omkar Thawakar, Sanath Narayan, Hisham Cholakkal, Rao Muhammad Anwer, Salman Khan, Jorma Laaksonen, Mubarak Shah, Fahad Shahbaz Khan
Open-world formulation relaxes the close-world static-learning assumption as follows: (a) first, it distinguishes a set of known categories as well as labels an unknown object as `unknown' and then (b) it incrementally learns the class of an unknown as and when the corresponding semantic labels become available.
1 code implementation • ICCV 2023 • Daochang Liu, Qiyue Li, AnhDung Dinh, Tingting Jiang, Mubarak Shah, Chang Xu
Temporal action segmentation is crucial for understanding long-form videos.
Ranked #2 on Action Segmentation on GTEA
1 code implementation • CVPR 2023 • Ishan Rajendrakumar Dave, Mamshad Nayeem Rizve, Chen Chen, Mubarak Shah
We observe that these representations complement each other depending on the nature of the action.
1 code implementation • 21 Mar 2023 • Omkar Thawakar, Rao Muhammad Anwer, Jorma Laaksonen, Orly Reiner, Mubarak Shah, Fahad Shahbaz Khan
Accurate 3D mitochondria instance segmentation in electron microscopy (EM) is a challenging problem and serves as a prerequisite to empirically analyze their distributions and morphology.
no code implementations • CVPR 2023 • Brandon Clark, Alec Kerrigan, Parth Parag Kulkarni, Vicente Vivanco Cepeda, Mubarak Shah
To this end, we introduce an end-to-end transformer-based architecture that exploits the relationship between different geographic levels (which we refer to as hierarchies) and the corresponding visual scene information in an image through hierarchical cross-attention.
Ranked #1 on Photo geolocation estimation on GWS15k
1 code implementation • CVPR 2023 • Sijie Zhu, Linjie Yang, Chen Chen, Mubarak Shah, Xiaohui Shen, Heng Wang
Visual Place Recognition (VPR) estimates the location of query images by matching them with images in a reference database.
no code implementations • CVPR 2023 • Mamshad Nayeem Rizve, Gaurav Mittal, Ye Yu, Matthew Hall, Sandra Sajeev, Mubarak Shah, Mei Chen
To address this, we present PivoTAL, Prior-driven Supervision for Weakly-supervised Temporal Action Localization, to approach WTAL from a localization-by-localization perspective by learning to localize the action snippets directly.
Weakly Supervised Action Localization Weakly Supervised Temporal Action Localization
no code implementations • CVPR 2023 • Rohit Gupta, Anirban Roy, Claire Christensen, Sujeong Kim, Sarah Gerard, Madeline Cincebeaux, Ajay Divakaran, Todd Grindal, Mubarak Shah
We learn a class prototype for each class and a loss function is employed to minimize the distances between a class prototype and the samples from the class.
no code implementations • ICCV 2023 • Saeed Vahidian, Sreevatsank Kadaveru, Woonjoon Baek, Weijia Wang, Vyacheslav Kungurtsev, Chen Chen, Mubarak Shah, Bill Lin
Specifically, we aim to investigate how ordered learning principles can contribute to alleviating the heterogeneity effects in FL.
1 code implementation • 28 Nov 2022 • Florinel-Alin Croitoru, Nicolae-Catalin Ristea, Dana Dascalescu, Radu Tudor Ionescu, Fahad Shahbaz Khan, Mubarak Shah
We propose a very fast frame-level model for anomaly detection in video, which learns to detect anomalies by distilling knowledge from multiple highly accurate object-level teacher models.
Ranked #20 on Anomaly Detection on CUHK Avenue
no code implementations • 23 Nov 2022 • Rohit Gupta, Naveed Akhtar, Gaurav Kumar Nayak, Ajmal Mian, Mubarak Shah
By using a nearly disjoint dataset to train the substitute model, our method removes the requirement that the substitute model be trained using the same dataset as the target model, and leverages queries to the target model to retain the fooling rate benefits provided by query-based methods.
1 code implementation • CVPR 2023 • Ankan Kumar Bhunia, Salman Khan, Hisham Cholakkal, Rao Muhammad Anwer, Jorma Laaksonen, Mubarak Shah, Fahad Shahbaz Khan
In this work, we show how denoising diffusion models can be applied for high-fidelity person image synthesis with strong sample diversity and enhanced mode coverage of the learnt data distribution.
no code implementations • 1 Nov 2022 • Jyoti Kini, Ajmal Mian, Mubarak Shah
We propose a method for joint detection and tracking of multiple objects in 3D point clouds, a task conventionally treated as a two-step process comprising object detection followed by data association.
no code implementations • 23 Oct 2022 • Guo-Jun Qi, Mubarak Shah
In this paper, we review adversarial pretraining of self-supervised deep networks including both convolutional neural networks and vision transformers.
3 code implementations • 16 Oct 2022 • Tushar Sangam, Ishan Rajendrakumar Dave, Waqas Sultani, Mubarak Shah
Drone-to-drone detection using visual feed has crucial applications, such as detecting drone collisions, detecting drone attacks, or coordinating flight with other drones.
1 code implementation • 30 Sep 2022 • Mahdi Morafah, Saeed Vahidian, Chen Chen, Mubarak Shah, Bill Lin
Though successful, federated learning presents new challenges for machine learning, especially when the issue of data heterogeneity, also known as Non-IID data, arises.
1 code implementation • 25 Sep 2022 • Neelu Madan, Nicolae-Catalin Ristea, Radu Tudor Ionescu, Kamal Nasrollahi, Fahad Shahbaz Khan, Thomas B. Moeslund, Mubarak Shah
In this work, we extend our previous self-supervised predictive convolutional attentive block (SSPCAB) with a 3D masked convolutional layer, a transformer for channel-wise attention, as well as a novel self-supervised objective based on Huber loss.
Ranked #5 on Anomaly Detection on CUHK Avenue
1 code implementation • 21 Sep 2022 • Saeed Vahidian, Mahdi Morafah, Weijia Wang, Vyacheslav Kungurtsev, Chen Chen, Mubarak Shah, Bill Lin
This small set of principal vectors is provided to the server so that the server can directly identify distribution similarities among the clients to form clusters.
1 code implementation • 10 Sep 2022 • Florinel-Alin Croitoru, Vlad Hondru, Radu Tudor Ionescu, Mubarak Shah
Denoising diffusion models represent a recent emerging topic in computer vision, demonstrating remarkable results in the area of generative modeling.
no code implementations • 22 Jul 2022 • Rohit Gupta, Naveed Akhtar, Ajmal Mian, Mubarak Shah
We establish that this is a result of the presence of false negative pairs in the training process, which increases model sensitivity to input perturbations.
no code implementations • 16 Jul 2022 • Antonio Barbalau, Radu Tudor Ionescu, Mariana-Iuliana Georgescu, Jacob Dueholm, Bharathkumar Ramachandra, Kamal Nasrollahi, Fahad Shahbaz Khan, Thomas B. Moeslund, Mubarak Shah
A self-supervised multi-task learning (SSMTL) framework for video anomaly detection was recently introduced in literature.
Ranked #3 on Anomaly Detection on CUHK Avenue
1 code implementation • 6 Jul 2022 • Shruti Vyas, Chen Chen, Mubarak Shah
There are no existing datasets for this problem, therefore we propose GAMa dataset, a large-scale dataset with ground videos and corresponding aerial images.
1 code implementation • 5 Jul 2022 • Mamshad Nayeem Rizve, Navid Kardan, Mubarak Shah
We also highlight the flexibility of our approach in solving novel class discovery task, demonstrate its stability in dealing with imbalanced data, and complement our approach with a technique to estimate the number of novel classes
Ranked #1 on Open-World Semi-Supervised Learning on CIFAR-100
Novel Class Discovery Open-World Semi-Supervised Learning +1
1 code implementation • 5 Jul 2022 • Mamshad Nayeem Rizve, Navid Kardan, Salman Khan, Fahad Shahbaz Khan, Mubarak Shah
In the open-world SSL problem, the objective is to recognize samples of known classes, and simultaneously detect and cluster samples belonging to novel classes present in unlabeled data.
Ranked #1 on Open-World Semi-Supervised Learning on CIFAR-10
1 code implementation • 5 Jul 2022 • Aisha Urooj Khan, Hilde Kuehne, Chuang Gan, Niels da Vitoria Lobo, Mubarak Shah
Transformers for visual-language representation learning have been getting a lot of interest and shown tremendous performance on visual question answering (VQA) and grounding.
1 code implementation • 18 Jun 2022 • Madeline C. Schiappa, Yogesh S. Rawat, Mubarak Shah
In this survey, we provide a review of existing approaches on self-supervised learning focusing on the video domain.
no code implementations • 6 Jun 2022 • Fabio De Sousa Ribeiro, Kevin Duarte, Miles Everett, Georgios Leontidis, Mubarak Shah
The aim of this survey is to provide a comprehensive overview of the capsule network research landscape, which will serve as a valuable resource for the community going forward.
1 code implementation • 24 May 2022 • Mitch Hill, Jonathan Mitchell, Chu Chen, Yuan Du, Mubarak Shah, Song-Chun Zhu
This work presents strategies to learn an Energy-Based Model (EBM) according to the desired length of its MCMC sampling trajectories.
no code implementations • 22 Apr 2022 • Jyoti Kini, Fahad Shahbaz Khan, Salman Khan, Mubarak Shah
We propose a novel self-supervised Video Object Segmentation (VOS) approach that strives to achieve better object-background discriminability for accurate object segmentation.
no code implementations • 22 Apr 2022 • Jyoti Kini, Mubarak Shah
Video Instance Segmentation is a fundamental computer vision task that deals with segmenting and tracking object instances across a video sequence.
no code implementations • 17 Apr 2022 • Rajat Modi, Aayush Jung Rana, Akash Kumar, Praveen Tirupattur, Shruti Vyas, Yogesh Singh Rawat, Mubarak Shah
Beyond possessing large enough size to feed data hungry machines (eg, transformers), what attributes measure the quality of a dataset?
1 code implementation • CVPR 2022 • Jiale Cao, Yanwei Pang, Rao Muhammad Anwer, Hisham Cholakkal, Jin Xie, Mubarak Shah, Fahad Shahbaz Khan
We propose a novel one-step transformer-based person search framework, PSTR, that jointly performs person detection and re-identification (re-id) in a single architecture.
1 code implementation • CVPR 2022 • Sijie Zhu, Mubarak Shah, Chen Chen
It does not rely on polar transform and infers faster than CNN-based methods.
Ranked #3 on Image-Based Localization on VIGOR Cross Area
1 code implementation • CVPR 2022 • Ishan Rajendrakumar Dave, Chen Chen, Mubarak Shah
Existing approaches for mitigating privacy leakage in action recognition require privacy labels along with the action labels from the video dataset.
Ranked #1 on Action Classification on UCF101
1 code implementation • CVPR 2022 • Nazmul Karim, Mamshad Nayeem Rizve, Nazanin Rahnavard, Ajmal Mian, Mubarak Shah
To combat label noise, recent state-of-the-art methods employ some sort of sample selection mechanism to select a possibly clean subset of data.
2 code implementations • 3 Dec 2021 • Huan Lei, Naveed Akhtar, Mubarak Shah, Ajmal Mian
In this paper, we propose a series of modular operations for effective geometric feature learning from 3D triangle meshes.
2 code implementations • CVPR 2022 • Akshita Gupta, Sanath Narayan, K J Joseph, Salman Khan, Fahad Shahbaz Khan, Mubarak Shah
In the case of incremental object detection, OW-DETR outperforms the state-of-the-art for all settings on PASCAL VOC.
no code implementations • 1 Dec 2021 • Kevin Duarte, Brian Chen, Nina Shvetsova, Andrew Rouditchenko, Samuel Thomas, Alexander Liu, David Harwath, James Glass, Hilde Kuehne, Mubarak Shah
We present a new multimodal capsule network that allows us to leverage the strength of capsules in the context of a multimodal learning framework on large amounts of video data.
no code implementations • NeurIPS 2021 • Alec Kerrigan, Kevin Duarte, Yogesh Rawat, Mubarak Shah
Given a video and a set of action classes, our method predicts a set of confidence scores for each class independently.
4 code implementations • CVPR 2022 • Nicolae-Catalin Ristea, Neelu Madan, Radu Tudor Ionescu, Kamal Nasrollahi, Fahad Shahbaz Khan, Thomas B. Moeslund, Mubarak Shah
Our block is equipped with a loss that minimizes the reconstruction error with respect to the masked area in the receptive field.
Ranked #1 on Anomaly Detection on CUHK Avenue (TBDC metric)
1 code implementation • CVPR 2022 • Andra Acsintoae, Andrei Florescu, Mariana-Iuliana Georgescu, Tudor Mare, Paul Sumedrea, Radu Tudor Ionescu, Fahad Shahbaz Khan, Mubarak Shah
This is a closed-set scenario that fails to test the capability of systems at detecting new anomaly types.
Ranked #6 on Anomaly Detection on CUHK Avenue (using extra training data)
no code implementations • 14 Oct 2021 • Ishan Dave, Naman Biyani, Brandon Clark, Rohit Gupta, Yogesh Rawat, Mubarak Shah
This technical report presents our approach "Knights" to solve the action recognition task on a small subset of Kinetics-400 i. e. Kinetics400ViPriors without using any extra-data.
no code implementations • ICLR 2022 • Navid Kardan, Mubarak Shah, Mitch Hill
A supervised learning problem is often formulated using an i. i. d.
1 code implementation • ICCV 2021 • Sanath Narayan, Akshita Gupta, Salman Khan, Fahad Shahbaz Khan, Ling Shao, Mubarak Shah
We note that the best existing multi-label ZSL method takes a shared approach towards attending to region features with a common set of attention maps for all the classes.
Ranked #2 on Multi-label zero-shot learning on Open Images V4
no code implementations • 1 Aug 2021 • Naveed Akhtar, Ajmal Mian, Navid Kardan, Mubarak Shah
In [2], we reviewed the contributions made by the computer vision community in adversarial attacks on deep learning (and their defenses) until the advent of year 2018.
no code implementations • 29 Jul 2021 • Amir Mazaheri, Mubarak Shah
To the best of our knowledge, this is the very first work on the text (free-form sentences) to video generation on more realistic video datasets like Actor and Action Dataset (A2D) or UCF101.
1 code implementation • 24 Jul 2021 • Praveen Tirupattur, Aayush J Rana, Tushar Sangam, Shruti Vyas, Yogesh S Rawat, Mubarak Shah
While various approaches have been shown effective for recognition task in recent works, they often do not deal with videos of lower resolution where the action is happening in a tiny region.
1 code implementation • 19 Jul 2021 • Dawei Du, Longyin Wen, Pengfei Zhu, Heng Fan, QinGhua Hu, Haibin Ling, Mubarak Shah, Junwen Pan, Ali Al-Ali, Amr Mohamed, Bakour Imene, Bin Dong, Binyu Zhang, Bouchali Hadia Nesma, Chenfeng Xu, Chenzhen Duan, Ciro Castiello, Corrado Mencar, Dingkang Liang, Florian Krüger, Gennaro Vessio, Giovanna Castellano, Jieru Wang, Junyu Gao, Khalid Abualsaud, Laihui Ding, Lei Zhao, Marco Cianciotta, Muhammad Saqib, Noor Almaadeed, Omar Elharrouss, Pei Lyu, Qi Wang, Shidong Liu, Shuang Qiu, Siyang Pan, Somaya Al-Maadeed, Sultan Daud Khan, Tamer Khattab, Tao Han, Thomas Golda, Wei Xu, Xiang Bai, Xiaoqing Xu, Xuelong Li, Yanyun Zhao, Ye Tian, Yingnan Lin, Yongchao Xu, Yuehan Yao, Zhenyu Xu, Zhijian Zhao, Zhipeng Luo, Zhiwei Wei, Zhiyuan Zhao
Crowd counting on the drone platform is an interesting topic in computer vision, which brings new challenges such as small object inference, background clutter and wide viewpoint.
no code implementations • 7 Jul 2021 • Nayyer Aafaq, Naveed Akhtar, Wei Liu, Mubarak Shah, Ajmal Mian
In contrast, we propose a GAN-based algorithm for crafting adversarial examples for neural image captioning that mimics the internal representation of the CNN such that the resulting deep features of the input image enable a controlled incorrect caption generation through the recurrent network.
no code implementations • 23 Jun 2021 • Crystal Gagne, Jyoti Kini, Daniel Smith, Mubarak Shah
Trail camera imagery has increasingly gained popularity amongst biologists for conservation and ecological research.
2 code implementations • CVPR 2021 • Alireza Zaeemzadeh, Niccolo Bisagno, Zeno Sambugaro, Nicola Conci, Nazanin Rahnavard, Mubarak Shah
In this paper, we argue that OOD samples can be detected more easily if the training data is embedded into a low-dimensional space, such that the embedded training samples lie on a union of 1-dimensional subspaces.
no code implementations • 7 Jun 2021 • Sarah Shiraz, Krishna Regmi, Shruti Vyas, Yogesh S. Rawat, Mubarak Shah
We address the problem of novel view video prediction; given a set of input video clips from a single/multiple views, our network is able to predict the video from a novel view.
no code implementations • 3 Jun 2021 • Aakash Kumar, Jyoti Kini, Mubarak Shah, Ajmal Mian
In recent times, the scope of LIDAR (Light Detection and Ranging) sensor-based technology has spread across numerous fields.
no code implementations • 22 May 2021 • Kevin Duarte, Yogesh S. Rawat, Mubarak Shah
By stochastically masking labels during loss computation, the method balances this ratio for each class, leading to improved recall on minority classes and improved precision on frequent classes.
1 code implementation • 14 May 2021 • Taojiannan Yang, Sijie Zhu, Matias Mendieta, Pu Wang, Ravikumar Balakrishnan, Minwoo Lee, Tao Han, Mubarak Shah, Chen Chen
MutualNet is a general training methodology that can be applied to various network structures (e. g., 2D networks: MobileNets, ResNet, 3D networks: SlowFast, X3D) and various tasks (e. g., image classification, object detection, segmentation, and action recognition), and is demonstrated to achieve consistent improvements on a variety of datasets.
1 code implementation • CVPR 2021 • Aisha Urooj Khan, Hilde Kuehne, Kevin Duarte, Chuang Gan, Niels Lobo, Mubarak Shah
In this paper, we focus on a more relaxed setting: the grounding of relevant visual entities in a weakly supervised manner by training on the VQA task alone.
no code implementations • 30 Apr 2021 • Sirnam Swetha, Hilde Kuehne, Yogesh S Rawat, Mubarak Shah
This paper proposes a novel approach for unsupervised sub-action learning in complex activities.
Ranked #31 on Action Segmentation on Breakfast
1 code implementation • ICCV 2021 • Ankan Kumar Bhunia, Salman Khan, Hisham Cholakkal, Rao Muhammad Anwer, Fahad Shahbaz Khan, Mubarak Shah
We propose a novel transformer-based styled handwritten text image generation approach, HWT, that strives to learn both style-content entanglement as well as global and local writing style patterns.
2 code implementations • CVPR 2021 • Muhammad Waseem Ashraf, Waqas Sultani, Mubarak Shah
The erratic movement of the source and target drones, small size, arbitrary shape, large intensity variations, and occlusion make this problem quite challenging.
no code implementations • 19 Mar 2021 • Ashkan Esmaeili, Marzieh Edraki, Nazanin Rahnavard, Mubarak Shah, Ajmal Mian
It is set forth that the proposed sparse perturbation is the most aligned sparse perturbation with the shortest path from the input sample to the decision boundary for some initial adversarial sample (the best sparse approximation of shortest path, likely to fool the model).
1 code implementation • CVPR 2021 • Praveen Tirupattur, Kevin Duarte, Yogesh Rawat, Mubarak Shah
We propose to improve action localization performance by modeling these action dependencies in a novel attention-based Multi-Label Action Dependency (MLAD)layer.
Ranked #1 on Action Detection on Multi-THUMOS
1 code implementation • CVPR 2021 • Mamshad Nayeem Rizve, Salman Khan, Fahad Shahbaz Khan, Mubarak Shah
Equivariance or invariance has been employed standalone in the previous works; however, to the best of our knowledge, they have not been used jointly.
1 code implementation • 20 Jan 2021 • Ishan Dave, Rohit Gupta, Mamshad Nayeem Rizve, Mubarak Shah
However, prior work on contrastive learning for video data has not explored the effect of explicitly encouraging the features to be distinct across the temporal dimension.
Ranked #9 on Self-supervised Video Retrieval on UCF101
2 code implementations • ICLR 2021 • Mamshad Nayeem Rizve, Kevin Duarte, Yogesh S Rawat, Mubarak Shah
The recent research in semi-supervised learning (SSL) is mostly dominated by consistency regularization based methods which achieve strong performance.
no code implementations • 4 Jan 2021 • Salman Khan, Muzammal Naseer, Munawar Hayat, Syed Waqas Zamir, Fahad Shahbaz Khan, Mubarak Shah
Astounding results from Transformer models on natural language tasks have intrigued the vision community to study their application to computer vision problems.
no code implementations • 1 Jan 2021 • Saeed Vahidian, Mohsen Joneidi, Ashkan Esmaeili, Siavash Khodadadeh, Sharare Zehtabian, Ladislau Boloni, Nazanin Rahnavard, Bill Lin, Mubarak Shah
The approach is based on the concept of {\em self-rank}, defined as the minimum number of samples needed to reconstruct all samples with an accuracy proportional to the rank-$K$ approximation.
no code implementations • ICCV 2021 • Alireza Zaeemzadeh, Shabnam Ghadar, Baldo Faieta, Zhe Lin, Nazanin Rahnavard, Mubarak Shah, Ratheesh Kalarot
For example, a user can ask for retrieving images similar to a query image, but with a different hair color, and no preference for absence/presence of eyeglasses in the results.
1 code implementation • ICCV 2021 • Krishna Regmi, Mubarak Shah
In this paper, we address the problem of video geo-localization by proposing a Geo-Temporal Feature Learning (GTFL) Network to simultaneously learn the discriminative features between the query videos and gallery images for estimating the geo-spatial trajectory of a query video.
1 code implementation • 24 Dec 2020 • Ce Zheng, Wenhan Wu, Chen Chen, Taojiannan Yang, Sijie Zhu, Ju Shen, Nasser Kehtarnavaz, Mubarak Shah
Furthermore, 2D and 3D human pose estimation datasets and evaluation metrics are included.
1 code implementation • 25 Nov 2020 • Simone Palazzo, Concetto Spampinato, Joseph Schmidt, Isaak Kavasidis, Daniela Giordano, Mubarak Shah
We argue that the reason why Li et al. [1] observe such high correlation in EEG data is their unconventional experimental design and settings that violate the basic cognitive neuroscience design recommendations, first and foremost the one of limiting the experiments' duration, as instead done in [2].
1 code implementation • CVPR 2021 • Mariana-Iuliana Georgescu, Antonio Barbalau, Radu Tudor Ionescu, Fahad Shahbaz Khan, Marius Popescu, Mubarak Shah
To the best of our knowledge, we are the first to approach anomalous event detection in video as a multi-task learning problem, integrating multiple self-supervised and knowledge distillation proxy tasks in a single architecture.
Ranked #2 on Anomaly Detection on UCSD Peds2
Abnormal Event Detection In Video Anomaly Detection In Surveillance Videos +4
1 code implementation • Findings of the Association for Computational Linguistics 2020 • Aisha Urooj Khan, Amir Mazaheri, Niels da Vitoria Lobo, Mubarak Shah
We present MMFT-BERT(MultiModal Fusion Transformer with BERT encodings), to solve Visual Question Answering (VQA) ensuring individual and combined processing of multiple input modalities.
no code implementations • 19 Oct 2020 • Jathushan Rajasegaran, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Mubarak Shah
This demonstrates their ability to acquire transferable knowledge, a capability that is central to human learning.
1 code implementation • 30 Sep 2020 • Viresh Ranjan, Boyu Wang, Mubarak Shah, Minh Hoai
We present sample selection strategies which make use of the density and uncertainty of predictions from the networks trained on one domain to select the informative images from a target domain of interest to acquire human annotation.
2 code implementations • 27 Aug 2020 • Mariana-Iuliana Georgescu, Radu Tudor Ionescu, Fahad Shahbaz Khan, Marius Popescu, Mubarak Shah
Following the standard formulation of abnormal event detection as outlier detection, we propose a background-agnostic framework that learns from training videos containing only normal events.
Abnormal Event Detection In Video Anomaly Detection In Surveillance Videos +2
3 code implementations • ECCV 2020 • Shi-Jie Sun, Naveed Akhtar, Xiang-Yu Song, HuanSheng Song, Ajmal Mian, Mubarak Shah
Deep learning-based Multiple Object Tracking (MOT) currently relies on off-the-shelf detectors for tracking-by-detection. This results in deep models that are detector biased and evaluations that are detector influenced.
no code implementations • 3 Aug 2020 • Aaron Ott, Amir Mazaheri, Niels D. Lobo, Mubarak Shah
In the photo enhancer, we employ super-resolution to increase the number of pixels in the embedded image and reduce the effect of stretching and distortion of pixels.
no code implementations • 28 Jul 2020 • Xiao-Yu Zhang, Ajmal Mian, Rohit Gupta, Nazanin Rahnavard, Mubarak Shah
We also propose an anomaly detection method to identify the target class in a Trojaned network.
Ranked #1 on Adversarial Defense on TrojAI Round 1
1 code implementation • 16 Jul 2020 • Marzieh Edraki, Nazmul Karim, Nazanin Rahnavard, Ajmal Mian, Mubarak Shah
We propose a detector that is based on the analysis of the intrinsic DNN properties; that are affected due to the Trojaning process.
1 code implementation • 14 Jul 2020 • Ugur Demir, Yogesh S Rawat, Mubarak Shah
In real-world surveillance environments, the actions in videos are captured at a wide range of resolutions.
2 code implementations • 17 Jun 2020 • Jathushan Rajasegaran, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Mubarak Shah
Our experiments show that, even in the first stage, self-supervision can outperform current state-of-the-art methods, with further gains achieved by our second stage distillation process.
Ranked #13 on Few-Shot Image Classification on FC100 5-way (5-shot)
no code implementations • 8 May 2020 • Aidean Sharghi, Niels da Vitoria Lobo, Mubarak Shah
We Input a set of video shots and the network generates a text description for each shot.
no code implementations • 23 Apr 2020 • Mamshad Nayeem Rizve, Ugur Demir, Praveen Tirupattur, Aayush Jung Rana, Kevin Duarte, Ishan Dave, Yogesh Singh Rawat, Mubarak Shah
For tubelet extraction, we propose a localization network which takes a video clip as input and spatio-temporally detects potential foreground regions at multiple scales to generate action tubelets.
no code implementations • 15 Apr 2020 • Rohit Gupta, Mubarak Shah
Accurate and fine-grained information about the extent of damage to buildings is essential for directing Humanitarian Aid and Disaster Response (HADR) operations in the immediate aftermath of any natural calamity.
1 code implementation • 1 Apr 2020 • Erik Quintanilla, Yogesh Rawat, Andrey Sakryukin, Mubarak Shah, Mohan Kankanhalli
We demonstrate the effectiveness of the proposed model on two different large-scale and publicly available datasets, YFCC100M and NUS-WIDE.
1 code implementation • CVPR 2020 • Jathushan Rajasegaran, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Mubarak Shah
In this paper, we hypothesize this problem can be avoided by learning a set of generalized parameters, that are neither specific to old nor new tasks.
1 code implementation • 7 Feb 2020 • Marzieh Edraki, Nazanin Rahnavard, Mubarak Shah
In this paper, we propose the SubSpace Capsule Network (SCN) that exploits the idea of capsule networks to model possible variations in the appearance or implicitly defined properties of an entity through a group of capsule subspaces instead of simply grouping neurons to create capsules.
no code implementations • 23 Nov 2019 • Mahdi M. Kalayeh, Mubarak Shah
In SSG, the same idea is applied to the intermediate layers of the network.
no code implementations • 22 Oct 2019 • Waqas Sultani, Mubarak Shah
However, using deep neural networks for automatic aerial action recognition is difficult due to the need for a large number of training aerial human action videos.
1 code implementation • ICCV 2019 • Kevin Duarte, Yogesh S Rawat, Mubarak Shah
In this work we propose a capsule-based approach for semi-supervised video object segmentation.
no code implementations • 21 Jul 2019 • Rui Hou, Chen Chen, Rahul Sukthankar, Mubarak Shah
Convolutional Neural Network (CNN) based image segmentation has made great progress in recent years.
Ranked #64 on Semi-Supervised Video Object Segmentation on DAVIS 2016
1 code implementation • ICCV 2019 • Leulseged Tesfaye Alemu, Marcello Pelillo, Mubarak Shah
By optimizing the constrained clustering in an end-to-end manner, we naturally leverage the contextual knowledge of a set of images corresponding to the given person-images.
Ranked #2 on Person Re-Identification on CUHK03 (Rank-5 metric)
1 code implementation • ICCV 2019 • Krishna Regmi, Mubarak Shah
Our Feature Fusion method combines the complementary features from a synthesized aerial image with the corresponding ground features to obtain a robust query representation.
no code implementations • 4 Apr 2019 • Viresh Ranjan, Mubarak Shah, Minh Hoai Nguyen
Most of the existing crowd counting approaches rely on local features for estimating the crowd density map.
no code implementations • 22 Dec 2018 • Emrah Basaran, Yonatan Tariku Tesfaye, Mubarak Shah
In recent years, we have seen the performance of video-based person Re-Identification (ReID) methods have improved considerably.
no code implementations • 2 Dec 2018 • Bruce McIntosh, Kevin Duarte, Yogesh S Rawat, Mubarak Shah
The existing works on actor-action localization are mainly focused on localization in a single frame instead of the full video.
2 code implementations • CVPR 2019 • Mohsen Joneidi, Alireza Zaeemzadeh, Nazanin Rahnavard, Mubarak Shah
In our algorithm, at each iteration, the maximum information from the structure of the data is captured by one selected sample, and the captured information is neglected in the next iterations by projection on the null-space of previously selected samples.