Search Results for author: Sanath Narayan

Found 27 papers, 19 papers with code

From Unimodal to Multimodal: Scaling up Projectors to Align Modalities

no code implementations28 Sep 2024 Mayug Maniparambil, Raiymbek Akshulakov, Yasser Abdelaziz Dahou Djilali, Sanath Narayan, Ankit Singh, Noel E. O'Connor

However, this practice has left powerful unimodal encoders for both vision and language underutilized in multimodal applications which raises a key question: Is there a plausible way to connect unimodal backbones for zero-shot vision-language tasks?

Image-text Retrieval Semantic Similarity +3

Open-Vocabulary Temporal Action Localization using Multimodal Guidance

no code implementations21 Jun 2024 Akshita Gupta, Aditya Arora, Sanath Narayan, Salman Khan, Fahad Shahbaz Khan, Graham W. Taylor

Open-Vocabulary Temporal Action Localization (OVTAL) enables a model to recognize any desired action category in videos without the need to explicitly curate training data for all categories.

Language Modelling Large Language Model +1

Efficient 3D-Aware Facial Image Editing via Attribute-Specific Prompt Learning

1 code implementation6 Jun 2024 Amandeep Kumar, Muhammad Awais, Sanath Narayan, Hisham Cholakkal, Salman Khan, Rao Muhammad Anwer

The LAE harnesses a pre-trained vision-language model to find text-guided attribute-specific editing direction in the latent space of any pre-trained 3D-aware GAN.

Attribute Language Modelling

Do VSR Models Generalize Beyond LRS3?

1 code implementation23 Nov 2023 Yasser Abdelaziz Dahou Djilali, Sanath Narayan, Eustache Le Bihan, Haithem Boussaid, Ebtessam Almazrouei, Merouane Debbah

The Lip Reading Sentences-3 (LRS3) benchmark has primarily been the focus of intense research in visual speech recognition (VSR) during the last few years.

Lip Reading speech-recognition +1

Remote Sensing Change Detection With Transformers Trained from Scratch

1 code implementation13 Apr 2023 Mubashir Noman, Mustansar Fiaz, Hisham Cholakkal, Sanath Narayan, Rao Muhammad Anwer, Salman Khan, Fahad Shahbaz Khan

Current transformer-based change detection (CD) approaches either employ a pre-trained model trained on large-scale image classification ImageNet dataset or rely on first pre-training on another CD dataset and then fine-tuning on the target benchmark.

Change Detection Image Classification

Cross-modulated Few-shot Image Generation for Colorectal Tissue Classification

1 code implementation4 Apr 2023 Amandeep Kumar, Ankan Kumar Bhunia, Sanath Narayan, Hisham Cholakkal, Rao Muhammad Anwer, Jorma Laaksonen, Fahad Shahbaz Khan

In this work, we propose a few-shot colorectal tissue image generation method for addressing the scarcity of histopathological training data for rare cancer tissues.

Data Augmentation Image Classification +1

Video Instance Segmentation in an Open-World

1 code implementation3 Apr 2023 Omkar Thawakar, Sanath Narayan, Hisham Cholakkal, Rao Muhammad Anwer, Salman Khan, Jorma Laaksonen, Mubarak Shah, Fahad Shahbaz Khan

Open-world formulation relaxes the close-world static-learning assumption as follows: (a) first, it distinguishes a set of known categories as well as labels an unknown object as `unknown' and then (b) it incrementally learns the class of an unknown as and when the corresponding semantic labels become available.

Instance Segmentation Semantic Segmentation +1

Video Instance Segmentation via Multi-scale Spatio-temporal Split Attention Transformer

1 code implementation24 Mar 2022 Omkar Thawakar, Sanath Narayan, Jiale Cao, Hisham Cholakkal, Rao Muhammad Anwer, Muhammad Haris Khan, Salman Khan, Michael Felsberg, Fahad Shahbaz Khan

When using the ResNet50 backbone, our MS-STS achieves a mask AP of 50. 1 %, outperforming the best reported results in literature by 2. 7 % and by 4. 8 % at higher overlap threshold of AP_75, while being comparable in model size and speed on Youtube-VIS 2019 val.

Instance Segmentation Semantic Segmentation +2

OW-DETR: Open-world Detection Transformer

2 code implementations CVPR 2022 Akshita Gupta, Sanath Narayan, K J Joseph, Salman Khan, Fahad Shahbaz Khan, Mubarak Shah

In the case of incremental object detection, OW-DETR outperforms the state-of-the-art for all settings on PASCAL VOC.

Inductive Bias Object +3

Discriminative Region-based Multi-Label Zero-Shot Learning

1 code implementation ICCV 2021 Sanath Narayan, Akshita Gupta, Salman Khan, Fahad Shahbaz Khan, Ling Shao, Mubarak Shah

We note that the best existing multi-label ZSL method takes a shared approach towards attending to region features with a common set of attention maps for all the classes.

Image Retrieval Multi-label zero-shot learning

Structured Latent Embeddings for Recognizing Unseen Classes in Unseen Domains

no code implementations12 Jul 2021 Shivam Chandhok, Sanath Narayan, Hisham Cholakkal, Rao Muhammad Anwer, Vineeth N Balasubramanian, Fahad Shahbaz Khan, Ling Shao

The need to address the scarcity of task-specific annotated data has resulted in concerted efforts in recent years for specific settings such as zero-shot learning (ZSL) and domain generalization (DG), to separately address the issues of semantic shift and domain shift, respectively.

Domain Generalization Zero-Shot Learning +1

Generative Multi-Label Zero-Shot Learning

1 code implementation27 Jan 2021 Akshita Gupta, Sanath Narayan, Salman Khan, Fahad Shahbaz Khan, Ling Shao, Joost Van de Weijer

Nevertheless, computing reliable attention maps for unseen classes during inference in a multi-label setting is still a challenge.

Attribute Generative Adversarial Network +3

3C-Net: Category Count and Center Loss for Weakly-Supervised Action Localization

1 code implementation ICCV 2019 Sanath Narayan, Hisham Cholakkal, Fahad Shahbaz Khan, Ling Shao

Our joint formulation has three terms: a classification term to ensure the separability of learned action features, an adapted multi-label center loss term to enhance the action feature discriminability and a counting loss term to delineate adjacent action sequences, leading to improved localization.

Action Classification Weakly Supervised Action Localization +2

A Large Dataset for Improving Patch Matching

1 code implementation4 Jan 2018 Rahul Mitra, Nehal Doiphode, Utkarsh Gautam, Sanath Narayan, Shuaib Ahmed, Sharat Chandran, Arjun Jain

Similarly on the Strecha dataset, we see an improvement of 3-5% for the matching task in non-planar scenes.

Patch Matching Retrieval

Improved Descriptors for Patch Matching and Reconstruction

no code implementations24 Jan 2017 Rahul Mitra, Jiakai Zhang, Sanath Narayan, Shuaib Ahmed, Sharat Chandran, Arjun Jain

Scenes from the Oxford ACRD, MVS and Synthetic datasets are used for evaluating the patch matching performance of the learnt descriptors while the Strecha dataset is used to evaluate the 3D reconstruction task.

3D Reconstruction Patch Matching

Hyper-Fisher Vectors for Action Recognition

no code implementations28 Sep 2015 Sanath Narayan, Kalpathi R. Ramakrishnan

We also perform experiments to show that the performance of the Hyper-Fisher Vector is robust to the dictionary size of the BoW encoding.

Action Recognition Temporal Action Localization

Cannot find the paper you are looking for? You can Submit a new open access paper.