Search Results for author: Hisham Cholakkal

Found 49 papers, 35 papers with code

Count- and Similarity-aware R-CNN for Pedestrian Detection

no code implementations ECCV 2020 Jin Xie, Hisham Cholakkal, Rao Muhammad Anwer, Fahad Shahbaz Khan, Yanwei Pang, Ling Shao, Mubarak Shah

We further introduce a count-and-similarity branch within the two-stage detection framework, which predicts pedestrian count as well as proposal similarity.

Human Instance Segmentation Pedestrian Detection +1

Fixing Localization Errors to Improve Image Classification

1 code implementation ECCV 2020 Guolei Sun, Salman Khan, Wen Li, Hisham Cholakkal, Fahad Shahbaz Khan, Luc van Gool

This way, in an effort to fix localization errors, our loss provides an extra supervisory signal that helps the model to better discriminate between similar classes.

Classification General Classification +3

PARIS3D: Reasoning-based 3D Part Segmentation Using Large Multimodal Model

1 code implementation4 Apr 2024 Amrin Kareem, Jean Lahoud, Hisham Cholakkal

We introduce a novel segmentation task known as reasoning part segmentation for 3D objects, aiming to output a segmentation mask based on complex and implicit textual queries about specific parts of a 3D object.

3D Part Segmentation Benchmarking +2

ELGC-Net: Efficient Local-Global Context Aggregation for Remote Sensing Change Detection

1 code implementation26 Mar 2024 Mubashir Noman, Mustansar Fiaz, Hisham Cholakkal, Salman Khan, Fahad Shahbaz Khan

Deep learning has shown remarkable success in remote sensing change detection (CD), aiming to identify semantic change regions between co-registered satellite image pairs acquired at distinct time stamps.

Change Detection

Rethinking Transformers Pre-training for Multi-Spectral Satellite Imagery

1 code implementation8 Mar 2024 Mubashir Noman, Muzammal Naseer, Hisham Cholakkal, Rao Muhammad Anwar, Salman Khan, Fahad Shahbaz Khan

Recent advances in unsupervised learning have demonstrated the ability of large vision models to achieve promising results on downstream tasks by pre-training on large amount of unlabelled data.

Multi-Label Classification

Semi-supervised Open-World Object Detection

1 code implementation25 Feb 2024 Sahal Shaji Mullappilly, Abhishek Singh Gehlot, Rao Muhammad Anwer, Fahad Shahbaz Khan, Hisham Cholakkal

We demonstrate the effectiveness of our SS-OWOD problem setting and approach for remote sensing object detection, proposing carefully curated splits and baseline performance evaluations.

Incremental Learning Object +2

BiMediX: Bilingual Medical Mixture of Experts LLM

1 code implementation20 Feb 2024 Sara Pieri, Sahal Shaji Mullappilly, Fahad Shahbaz Khan, Rao Muhammad Anwer, Salman Khan, Timothy Baldwin, Hisham Cholakkal

In this paper, we introduce BiMediX, the first bilingual medical mixture of experts LLM designed for seamless interaction in both English and Arabic.

Multiple-choice Open-Ended Question Answering

Arabic Mini-ClimateGPT : A Climate Change and Sustainability Tailored Arabic LLM

1 code implementation14 Dec 2023 Sahal Shaji Mullappilly, Abdelrahman Shaker, Omkar Thawakar, Hisham Cholakkal, Rao Muhammad Anwer, Salman Khan, Fahad Shahbaz Khan

To this end, we propose a light-weight Arabic Mini-ClimateGPT that is built on an open-source LLM and is specifically fine-tuned on a conversational-style instruction tuning curated Arabic dataset Clima500-Instruct with over 500k instructions about climate change and sustainability.

GLaMM: Pixel Grounding Large Multimodal Model

1 code implementation6 Nov 2023 Hanoona Rasheed, Muhammad Maaz, Sahal Shaji Mullappilly, Abdelrahman Shaker, Salman Khan, Hisham Cholakkal, Rao M. Anwer, Erix Xing, Ming-Hsuan Yang, Fahad S. Khan

In this work, we present Grounding LMM (GLaMM), the first model that can generate natural language responses seamlessly intertwined with corresponding object segmentation masks.

Conversational Question Answering Image Captioning +5

DDAM-PS: Diligent Domain Adaptive Mixer for Person Search

1 code implementation31 Oct 2023 Mohammed Khaleed Almansoori, Mustansar Fiaz, Hisham Cholakkal

The objective of the two bridge losses is to guide the moderate mixed-domain representations to maintain an appropriate distance from both the source and target domain representations.

Domain Adaptation Pedestrian Detection +3

TransRadar: Adaptive-Directional Transformer for Real-Time Multi-View Radar Semantic Segmentation

1 code implementation3 Oct 2023 Yahia Dalbah, Jean Lahoud, Hisham Cholakkal

Scene understanding plays an essential role in enabling autonomous driving and maintaining high standards of performance and safety.

Autonomous Driving Scene Understanding +1

SA2-Net: Scale-aware Attention Network for Microscopic Image Segmentation

1 code implementation28 Sep 2023 Mustansar Fiaz, Moein Heidari, Rao Muhammad Anwer, Hisham Cholakkal

Specifically, we propose scale-aware attention (SA2) module designed to capture inherent variations in scales and shapes of microscopic regions, such as cells, for accurate segmentation.

Image Segmentation Semantic Segmentation

3D Indoor Instance Segmentation in an Open-World

1 code implementation NeurIPS 2023 Mohamed El Amine Boudjoghra, Salwa K. Al Khatib, Jean Lahoud, Hisham Cholakkal, Rao Muhammad Anwer, Salman Khan, Fahad Khan

We argue that such a closed-world assumption is restrictive and explore for the first time 3D indoor instance segmentation in an open-world setting, where the model is allowed to distinguish a set of known classes as well as identify an unknown object as unknown and then later incrementally learning the semantic category of the unknown when the corresponding category labels are available.

3D Instance Segmentation Segmentation +1

Foundational Models Defining a New Era in Vision: A Survey and Outlook

1 code implementation25 Jul 2023 Muhammad Awais, Muzammal Naseer, Salman Khan, Rao Muhammad Anwer, Hisham Cholakkal, Mubarak Shah, Ming-Hsuan Yang, Fahad Shahbaz Khan

Vision systems to see and reason about the compositional nature of visual scenes are fundamental to understanding our world.

Benchmarking

XrayGPT: Chest Radiographs Summarization using Medical Vision-Language Models

1 code implementation13 Jun 2023 Omkar Thawkar, Abdelrahman Shaker, Sahal Shaji Mullappilly, Hisham Cholakkal, Rao Muhammad Anwer, Salman Khan, Jorma Laaksonen, Fahad Shahbaz Khan

The latest breakthroughs in large vision-language models, such as Bard and GPT-4, have showcased extraordinary abilities in performing a wide range of tasks.

Language Modelling Large Language Model

Salient Mask-Guided Vision Transformer for Fine-Grained Classification

1 code implementation11 May 2023 Dmitry Demidov, Muhammad Hamza Sharif, Aliakbar Abdurahimov, Hisham Cholakkal, Fahad Shahbaz Khan

Fine-grained visual classification (FGVC) is a challenging computer vision problem, where the task is to automatically recognise objects from subordinate categories.

Classification Fine-Grained Image Classification

Discriminative Co-Saliency and Background Mining Transformer for Co-Salient Object Detection

1 code implementation CVPR 2023 Long Li, Junwei Han, Ni Zhang, Nian Liu, Salman Khan, Hisham Cholakkal, Rao Muhammad Anwer, Fahad Shahbaz Khan

Then, we use two types of pre-defined tokens to mine co-saliency and background information via our proposed contrast-induced pixel-to-token correlation and co-saliency token-to-token correlation modules.

Computational Efficiency Co-Salient Object Detection +3

RadarFormer: Lightweight and Accurate Real-Time Radar Object Detection Model

1 code implementation17 Apr 2023 Yahia Dalbah, Jean Lahoud, Hisham Cholakkal

This improvement was associated with the increasing use of LiDAR sensors and point cloud data to facilitate the task of object detection and recognition in autonomous driving.

Autonomous Driving object-detection +2

Remote Sensing Change Detection With Transformers Trained from Scratch

1 code implementation13 Apr 2023 Mubashir Noman, Mustansar Fiaz, Hisham Cholakkal, Sanath Narayan, Rao Muhammad Anwer, Salman Khan, Fahad Shahbaz Khan

Current transformer-based change detection (CD) approaches either employ a pre-trained model trained on large-scale image classification ImageNet dataset or rely on first pre-training on another CD dataset and then fine-tuning on the target benchmark.

Change Detection Image Classification

Cross-modulated Few-shot Image Generation for Colorectal Tissue Classification

1 code implementation4 Apr 2023 Amandeep Kumar, Ankan Kumar Bhunia, Sanath Narayan, Hisham Cholakkal, Rao Muhammad Anwer, Jorma Laaksonen, Fahad Shahbaz Khan

In this work, we propose a few-shot colorectal tissue image generation method for addressing the scarcity of histopathological training data for rare cancer tissues.

Data Augmentation Image Classification +1

Video Instance Segmentation in an Open-World

1 code implementation3 Apr 2023 Omkar Thawakar, Sanath Narayan, Hisham Cholakkal, Rao Muhammad Anwer, Salman Khan, Jorma Laaksonen, Mubarak Shah, Fahad Shahbaz Khan

Open-world formulation relaxes the close-world static-learning assumption as follows: (a) first, it distinguishes a set of known categories as well as labels an unknown object as `unknown' and then (b) it incrementally learns the class of an unknown as and when the corresponding semantic labels become available.

Instance Segmentation Semantic Segmentation +1

Person Image Synthesis via Denoising Diffusion Model

1 code implementation CVPR 2023 Ankan Kumar Bhunia, Salman Khan, Hisham Cholakkal, Rao Muhammad Anwer, Jorma Laaksonen, Mubarak Shah, Fahad Shahbaz Khan

In this work, we show how denoising diffusion models can be applied for high-fidelity person image synthesis with strong sample diversity and enhanced mode coverage of the learnt data distribution.

Denoising Image Generation

CMR3D: Contextualized Multi-Stage Refinement for 3D Object Detection

no code implementations13 Sep 2022 Dhanalaxmi Gaddam, Jean Lahoud, Fahad Shahbaz Khan, Rao Muhammad Anwer, Hisham Cholakkal

In this work, we propose Contextualized Multi-Stage Refinement for 3D Object Detection (CMR3D) framework, which takes a 3D scene as input and strives to explicitly integrate useful contextual information of the scene at multiple levels to predict a set of object bounding-boxes along with their corresponding semantic labels.

3D Object Detection Object +2

Transformers in Remote Sensing: A Survey

no code implementations2 Sep 2022 Abdulaziz Amer Aleissaee, Amandeep Kumar, Rao Muhammad Anwer, Salman Khan, Hisham Cholakkal, Gui-Song Xia, Fahad Shahbaz Khan

Deep learning-based algorithms have seen a massive popularity in different areas of remote sensing image analysis over the past decade.

AVisT: A Benchmark for Visual Object Tracking in Adverse Visibility

1 code implementation14 Aug 2022 Mubashir Noman, Wafa Al Ghallabi, Daniya Najiha, Christoph Mayer, Akshay Dudhane, Martin Danelljan, Hisham Cholakkal, Salman Khan, Luc van Gool, Fahad Shahbaz Khan

While being greatly benefiting to the tracking research, existing benchmarks do not pose the same difficulty as before with recent trackers achieving higher performance mainly due to (i) the introduction of more sophisticated transformers-based methods and (ii) the lack of diverse scenarios with adverse visibility such as, severe weather conditions, camouflage and imaging effects.

Visual Object Tracking Visual Tracking

Multi-scale Feature Aggregation for Crowd Counting

no code implementations10 Aug 2022 Xiaoheng Jiang, Xinyi Wu, Hisham Cholakkal, Rao Muhammad Anwer, Jiale Cao Mingliang Xu, Bing Zhou, Yanwei Pang, Fahad Shahbaz Khan

The SkipAgg module directly propagates features with small receptive fields to features with much larger receptive fields.

Crowd Counting

3D Vision with Transformers: A Survey

1 code implementation8 Aug 2022 Jean Lahoud, Jiale Cao, Fahad Shahbaz Khan, Hisham Cholakkal, Rao Muhammad Anwer, Salman Khan, Ming-Hsuan Yang

The success of the transformer architecture in natural language processing has recently triggered attention in the computer vision field.

Pose Estimation

On the Robustness of 3D Object Detectors

no code implementations20 Jul 2022 Fatima Albreiki, Sultan Abughazal, Jean Lahoud, Rao Anwer, Hisham Cholakkal, Fahad Khan

To the best of our knowledge, we are the first to investigate the robustness of point-based 3D object detectors.

3D Object Detection Object +1

EdgeNeXt: Efficiently Amalgamated CNN-Transformer Architecture for Mobile Vision Applications

7 code implementations21 Jun 2022 Muhammad Maaz, Abdelrahman Shaker, Hisham Cholakkal, Salman Khan, Syed Waqas Zamir, Rao Muhammad Anwer, Fahad Shahbaz Khan

Our EdgeNeXt model with 1. 3M parameters achieves 71. 2% top-1 accuracy on ImageNet-1K, outperforming MobileViT with an absolute gain of 2. 2% with 28% reduction in FLOPs.

Image Classification Object Detection +1

PSTR: End-to-End One-Step Person Search With Transformers

1 code implementation CVPR 2022 Jiale Cao, Yanwei Pang, Rao Muhammad Anwer, Hisham Cholakkal, Jin Xie, Mubarak Shah, Fahad Shahbaz Khan

We propose a novel one-step transformer-based person search framework, PSTR, that jointly performs person detection and re-identification (re-id) in a single architecture.

Human Detection Person Search

Video Instance Segmentation via Multi-scale Spatio-temporal Split Attention Transformer

1 code implementation24 Mar 2022 Omkar Thawakar, Sanath Narayan, Jiale Cao, Hisham Cholakkal, Rao Muhammad Anwer, Muhammad Haris Khan, Salman Khan, Michael Felsberg, Fahad Shahbaz Khan

When using the ResNet50 backbone, our MS-STS achieves a mask AP of 50. 1 %, outperforming the best reported results in literature by 2. 7 % and by 4. 8 % at higher overlap threshold of AP_75, while being comparable in model size and speed on Youtube-VIS 2019 val.

Instance Segmentation Semantic Segmentation +2

DoodleFormer: Creative Sketch Drawing with Transformers

no code implementations6 Dec 2021 Ankan Kumar Bhunia, Salman Khan, Hisham Cholakkal, Rao Muhammad Anwer, Fahad Shahbaz Khan, Jorma Laaksonen, Michael Felsberg

Creative sketch image generation is a challenging vision problem, where the task is to generate diverse, yet realistic creative sketches possessing the unseen composition of the visual-world objects.

Image Generation

Structured Latent Embeddings for Recognizing Unseen Classes in Unseen Domains

no code implementations12 Jul 2021 Shivam Chandhok, Sanath Narayan, Hisham Cholakkal, Rao Muhammad Anwer, Vineeth N Balasubramanian, Fahad Shahbaz Khan, Ling Shao

The need to address the scarcity of task-specific annotated data has resulted in concerted efforts in recent years for specific settings such as zero-shot learning (ZSL) and domain generalization (DG), to separately address the issues of semantic shift and domain shift, respectively.

Domain Generalization Zero-Shot Learning +1

Handwriting Transformers

1 code implementation ICCV 2021 Ankan Kumar Bhunia, Salman Khan, Hisham Cholakkal, Rao Muhammad Anwer, Fahad Shahbaz Khan, Mubarak Shah

We propose a novel transformer-based styled handwritten text image generation approach, HWT, that strives to learn both style-content entanglement as well as global and local writing style patterns.

Image Generation Text Generation

SipMask: Spatial Information Preservation for Fast Image and Video Instance Segmentation

1 code implementation ECCV 2020 Jiale Cao, Rao Muhammad Anwer, Hisham Cholakkal, Fahad Shahbaz Khan, Yanwei Pang, Ling Shao

In terms of real-time capabilities, SipMask outperforms YOLACT with an absolute gain of 3. 0% (mask AP) under similar settings, while operating at comparable speed on a Titan Xp.

object-detection Object Detection +4

PSC-Net: Learning Part Spatial Co-occurrence for Occluded Pedestrian Detection

no code implementations25 Jan 2020 Jin Xie, Yanwei Pang, Hisham Cholakkal, Rao Muhammad Anwer, Fahad Shahbaz Khan, Ling Shao

On the heavy occluded (\textbf{HO}) set of CityPerosns test set, our PSC-Net obtains an absolute gain of 4. 0\% in terms of log-average miss rate over the state-of-the-art with same backbone, input scale and without using additional VBB supervision.

Pedestrian Detection

Fine-grained Recognition: Accounting for Subtle Differences between Similar Classes

no code implementations14 Dec 2019 Guolei Sun, Hisham Cholakkal, Salman Khan, Fahad Shahbaz Khan, Ling Shao

The main requisite for fine-grained recognition task is to focus on subtle discriminative details that make the subordinate classes different from each other.

Fine-Grained Image Classification

Towards Partial Supervision for Generic Object Counting in Natural Scenes

1 code implementation13 Dec 2019 Hisham Cholakkal, Guolei Sun, Salman Khan, Fahad Shahbaz Khan, Ling Shao, Luc van Gool

Our RLC framework further reduces the annotation cost arising from large numbers of object categories in a dataset by only using lower-count supervision for a subset of categories and class-labels for the remaining ones.

Image Classification Image-level Supervised Instance Segmentation +3

3C-Net: Category Count and Center Loss for Weakly-Supervised Action Localization

1 code implementation ICCV 2019 Sanath Narayan, Hisham Cholakkal, Fahad Shahbaz Khan, Ling Shao

Our joint formulation has three terms: a classification term to ensure the separability of learned action features, an adapted multi-label center loss term to enhance the action feature discriminability and a counting loss term to delineate adjacent action sequences, leading to improved localization.

Action Classification Weakly Supervised Action Localization +2

Object Counting and Instance Segmentation with Image-level Supervision

2 code implementations CVPR 2019 Hisham Cholakkal, Guolei Sun, Fahad Shahbaz Khan, Ling Shao

Moreover, our approach improves state-of-the-art image-level supervised instance segmentation with a relative gain of 17. 8% in terms of average best overlap, on the PASCAL VOC 2012 dataset.

Image-level Supervised Instance Segmentation Object +2

L1-regularized Reconstruction Error as Alpha Matte

no code implementations9 Feb 2017 Jubin Johnson, Hisham Cholakkal, Deepu Rajan

Sampling-based alpha matting methods have traditionally followed the compositing equation to estimate the alpha value at a pixel from a pair of foreground (F) and background (B) samples.

Image Matting Video Matting

Backtracking Spatial Pyramid Pooling (SPP)-based Image Classifier for Weakly Supervised Top-down Salient Object Detection

no code implementations16 Nov 2016 Hisham Cholakkal, Jubin Johnson, Deepu Rajan

First, the probabilistic contribution of each image region to the confidence of a CNN-based image classifier is computed through a backtracking strategy to produce top-down saliency.

object-detection RGB Salient Object Detection +1

Backtracking ScSPM Image Classifier for Weakly Supervised Top-Down Saliency

no code implementations CVPR 2016 Hisham Cholakkal, Jubin Johnson, Deepu Rajan

We propose a weakly supervised top-down saliency framework using only binary labels that indicate the presence/absence of an object in an image.

object-detection Object Detection

A Classifier-guided Approach for Top-down Salient Object Detection

no code implementations22 Apr 2016 Hisham Cholakkal, Jubin Johnson, Deepu Rajan

Although the role of the classifier is to support salient object detection, we evaluate its performance in image classification and also illustrate the utility of thresholded saliency maps for image segmentation.

Classification General Classification +7

Sparse Coding for Alpha Matting

no code implementations11 Apr 2016 Jubin Johnson, Ehsan Shahrian Varnousfaderani, Hisham Cholakkal, Deepu Rajan

In this paper, the matting problem is reinterpreted as a sparse coding of pixel features, wherein the sum of the codes gives the estimate of the alpha matte from a set of unpaired F and B samples.

Image Matting Video Matting

Cannot find the paper you are looking for? You can Submit a new open access paper.