no code implementations • 28 Sep 2024 • Mayug Maniparambil, Raiymbek Akshulakov, Yasser Abdelaziz Dahou Djilali, Sanath Narayan, Ankit Singh, Noel E. O'Connor
However, this practice has left powerful unimodal encoders for both vision and language underutilized in multimodal applications which raises a key question: Is there a plausible way to connect unimodal backbones for zero-shot vision-language tasks?
no code implementations • 20 Jul 2024 • Quentin Malartic, Nilabhra Roy Chowdhury, Ruxandra Cojocaru, Mugariya Farooq, Giulia Campesan, Yasser Abdelaziz Dahou Djilali, Sanath Narayan, Ankit Singh, Maksim Velikanov, Basma El Amel Boussaha, Mohammed Al-Yafeai, Hamza Alobeidli, Leen Al Qadi, Mohamed El Amine Seddik, Kirill Fedyanin, REDA ALAMI, Hakim Hacid
We introduce Falcon2-11B, a foundation model trained on over five trillion tokens, and its multimodal counterpart, Falcon2-11B-vlm, which is a vision-to-text model.
no code implementations • 21 Jun 2024 • Akshita Gupta, Aditya Arora, Sanath Narayan, Salman Khan, Fahad Shahbaz Khan, Graham W. Taylor
Open-Vocabulary Temporal Action Localization (OVTAL) enables a model to recognize any desired action category in videos without the need to explicitly curate training data for all categories.
1 code implementation • 6 Jun 2024 • Amandeep Kumar, Muhammad Awais, Sanath Narayan, Hisham Cholakkal, Salman Khan, Rao Muhammad Anwer
The LAE harnesses a pre-trained vision-language model to find text-guided attribute-specific editing direction in the latent space of any pre-trained 3D-aware GAN.
1 code implementation • 28 May 2024 • Amandeep Kumar, Muzammal Naseer, Sanath Narayan, Rao Muhammad Anwer, Salman Khan, Hisham Cholakkal
Moreover, they often result in misaligned image generation for prompt sequences featuring multiple objects.
1 code implementation • CVPR 2024 • Mayug Maniparambil, Raiymbek Akshulakov, Yasser Abdelaziz Dahou Djilali, Sanath Narayan, Mohamed El Amine Seddik, Karttikeya Mangalam, Noel E. O'Connor
In the absence of statistical similarity in aligned encoders like CLIP, we show that a possible matching of unaligned encoders exists without any training.
1 code implementation • 23 Nov 2023 • Yasser Abdelaziz Dahou Djilali, Sanath Narayan, Eustache Le Bihan, Haithem Boussaid, Ebtessam Almazrouei, Merouane Debbah
The Lip Reading Sentences-3 (LRS3) benchmark has primarily been the focus of intense research in visual speech recognition (VSR) during the last few years.
no code implementations • ICCV 2023 • Yasser Abdelaziz Dahou Djilali, Sanath Narayan, Haithem Boussaid, Ebtessam Almazrouei, Merouane Debbah
Visual Speech Recognition (VSR) differs from the common perception tasks as it requires deeper reasoning over the video sequence, even by human experts.
1 code implementation • 13 Apr 2023 • Mubashir Noman, Mustansar Fiaz, Hisham Cholakkal, Sanath Narayan, Rao Muhammad Anwer, Salman Khan, Fahad Shahbaz Khan
Current transformer-based change detection (CD) approaches either employ a pre-trained model trained on large-scale image classification ImageNet dataset or rely on first pre-training on another CD dataset and then fine-tuning on the target benchmark.
1 code implementation • 4 Apr 2023 • Amandeep Kumar, Ankan Kumar Bhunia, Sanath Narayan, Hisham Cholakkal, Rao Muhammad Anwer, Jorma Laaksonen, Fahad Shahbaz Khan
In this work, we propose a few-shot colorectal tissue image generation method for addressing the scarcity of histopathological training data for rare cancer tissues.
1 code implementation • ICCV 2023 • Amandeep Kumar, Ankan Kumar Bhunia, Sanath Narayan, Hisham Cholakkal, Rao Muhammad Anwer, Salman Khan, Ming-Hsuan Yang, Fahad Shahbaz Khan
We present a method to efficiently generate 3D-aware high-resolution images that are view-consistent across multiple target views.
1 code implementation • 3 Apr 2023 • Omkar Thawakar, Sanath Narayan, Hisham Cholakkal, Rao Muhammad Anwer, Salman Khan, Jorma Laaksonen, Mubarak Shah, Fahad Shahbaz Khan
Open-world formulation relaxes the close-world static-learning assumption as follows: (a) first, it distinguishes a set of known categories as well as labels an unknown object as `unknown' and then (b) it incrementally learns the class of an unknown as and when the corresponding semantic labels become available.
1 code implementation • 7 Oct 2022 • Mustansar Fiaz, Hisham Cholakkal, Sanath Narayan, Rao Muhammad Anwer, Fahad Shahbaz Khan
Our PS-ARM achieves state-of-the-art performance on both datasets.
1 code implementation • 24 Mar 2022 • Omkar Thawakar, Sanath Narayan, Jiale Cao, Hisham Cholakkal, Rao Muhammad Anwer, Muhammad Haris Khan, Salman Khan, Michael Felsberg, Fahad Shahbaz Khan
When using the ResNet50 backbone, our MS-STS achieves a mask AP of 50. 1 %, outperforming the best reported results in literature by 2. 7 % and by 4. 8 % at higher overlap threshold of AP_75, while being comparable in model size and speed on Youtube-VIS 2019 val.
1 code implementation • CVPR 2022 • Anirudh Thatipelli, Sanath Narayan, Salman Khan, Rao Muhammad Anwer, Fahad Shahbaz Khan, Bernard Ghanem
Experiments are performed on four few-shot action recognition benchmarks: Kinetics, SSv2, HMDB51 and UCF101.
Ranked #1 on Few Shot Action Recognition on UCF101 (using extra training data)
2 code implementations • CVPR 2022 • Akshita Gupta, Sanath Narayan, K J Joseph, Salman Khan, Fahad Shahbaz Khan, Mubarak Shah
In the case of incremental object detection, OW-DETR outperforms the state-of-the-art for all settings on PASCAL VOC.
1 code implementation • ICCV 2021 • Sanath Narayan, Akshita Gupta, Salman Khan, Fahad Shahbaz Khan, Ling Shao, Mubarak Shah
We note that the best existing multi-label ZSL method takes a shared approach towards attending to region features with a common set of attention maps for all the classes.
Ranked #2 on Multi-label zero-shot learning on Open Images V4
no code implementations • 12 Jul 2021 • Shivam Chandhok, Sanath Narayan, Hisham Cholakkal, Rao Muhammad Anwer, Vineeth N Balasubramanian, Fahad Shahbaz Khan, Ling Shao
The need to address the scarcity of task-specific annotated data has resulted in concerted efforts in recent years for specific settings such as zero-shot learning (ZSL) and domain generalization (DG), to separately address the issues of semantic shift and domain shift, respectively.
1 code implementation • 27 Jan 2021 • Akshita Gupta, Sanath Narayan, Salman Khan, Fahad Shahbaz Khan, Ling Shao, Joost Van de Weijer
Nevertheless, computing reliable attention maps for unseen classes during inference in a multi-label setting is still a challenge.
Ranked #8 on Multi-label zero-shot learning on NUS-WIDE
1 code implementation • ICCV 2021 • Sanath Narayan, Hisham Cholakkal, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang, Ling Shao
The proposed formulation comprises a discriminative and a denoising loss term for enhancing temporal action localization.
Ranked #3 on Weakly Supervised Action Localization on THUMOS’14
1 code implementation • ECCV 2020 • Sanath Narayan, Akshita Gupta, Fahad Shahbaz Khan, Cees G. M. Snoek, Ling Shao
We propose to enforce semantic consistency at all stages of (generalized) zero-shot learning: training, feature synthesis and classification.
Ranked #2 on Generalized Zero-Shot Learning on Oxford 102 Flower
1 code implementation • ICCV 2019 • Sanath Narayan, Hisham Cholakkal, Fahad Shahbaz Khan, Ling Shao
Our joint formulation has three terms: a classification term to ensure the separability of learned action features, an adapted multi-label center loss term to enhance the action feature discriminability and a counting loss term to delineate adjacent action sequences, leading to improved localization.
Ranked #1 on Action Classification on THUMOS'14
Action Classification Weakly Supervised Action Localization +2
1 code implementation • CVPR 2019 • Devraj Mandal, Sanath Narayan, Saikumar Dwivedi, Vikram Gupta, Shuaib Ahmed, Fahad Shahbaz Khan, Ling Shao
We introduce an out-of-distribution detector that determines whether the video features belong to a seen or unseen action category.
Action Recognition In Videos Out-of-Distribution Detection +2
1 code implementation • 4 Jan 2018 • Rahul Mitra, Nehal Doiphode, Utkarsh Gautam, Sanath Narayan, Shuaib Ahmed, Sharat Chandran, Arjun Jain
Similarly on the Strecha dataset, we see an improvement of 3-5% for the matching task in non-planar scenes.
no code implementations • 24 Jan 2017 • Rahul Mitra, Jiakai Zhang, Sanath Narayan, Shuaib Ahmed, Sharat Chandran, Arjun Jain
Scenes from the Oxford ACRD, MVS and Synthetic datasets are used for evaluating the patch matching performance of the learnt descriptors while the Strecha dataset is used to evaluate the 3D reconstruction task.
no code implementations • 28 Sep 2015 • Sanath Narayan, Kalpathi R. Ramakrishnan
We also perform experiments to show that the performance of the Hyper-Fisher Vector is robust to the dictionary size of the BoW encoding.
no code implementations • CVPR 2014 • Sanath Narayan, Kalpathi R. Ramakrishnan
Different object-parts have varying degrees of interactions with the other parts during an action cycle.