no code implementations • 11 Apr 2024 • Kanchana Ranasinghe, Satya Narayan Shukla, Omid Poursaeed, Michael S. Ryoo, Tsung-Yu Lin
Integration of Large Language Models (LLMs) into visual domain tasks, resulting in visual-LLMs (V-LLMs), has enabled exceptional performance in vision-language tasks, particularly for visual question answering (VQA).
1 code implementation • 25 Mar 2024 • Kanchana Ranasinghe, Xiang Li, Kumara Kahatapitiya, Michael S. Ryoo
In addition to faster inference, we discover the resulting models to yield surprisingly good accuracy on long-video tasks, even with no video specific information.
1 code implementation • 21 Mar 2024 • Kumara Kahatapitiya, Kanchana Ranasinghe, Jongwoo Park, Michael S. Ryoo
In this paper, we introduce a Language Repository (LangRepo) for LLMs, that maintains concise and structured information as an interpretable (i. e., all-textual) representation.
1 code implementation • 21 Mar 2024 • Hasindri Watawana, Kanchana Ranasinghe, Tariq Mahmood, Muzammal Naseer, Salman Khan, Fahad Shahbaz Khan
Self-supervised representation learning has been highly promising for histopathology image analysis with numerous approaches leveraging their patient-slide-patch hierarchy to learn better representations.
no code implementations • 6 Dec 2023 • Ryan Burgert, Xiang Li, Abe Leite, Kanchana Ranasinghe, Michael S. Ryoo
We explore the problem of computationally generating special `prime' images that produce optical illusions when physically arranged and viewed in a certain way.
1 code implementation • 23 Nov 2022 • Ryan Burgert, Kanchana Ranasinghe, Xiang Li, Michael S. Ryoo
In this work, we explore how an off-the-shelf text-to-image diffusion model, trained without exposure to localization information, can ground various semantic phrases without segmentation-specific re-training.
1 code implementation • ICCV 2023 • Kanchana Ranasinghe, Brandon McKinzie, Sachin Ravi, Yinfei Yang, Alexander Toshev, Jonathon Shlens
In this work we examine how well vision-language models are able to understand where objects reside within an image and group together visually related parts of the imagery.
1 code implementation • CVPR 2022 • Kanchana Ranasinghe, Muzammal Naseer, Salman Khan, Fahad Shahbaz Khan, Michael Ryoo
To the best of our knowledge, the proposed approach is the first to alleviate the dependency on negative samples or dedicated memory banks in Self-supervised Video Transformer (SVT).
Ranked #55 on Action Recognition on UCF101
3 code implementations • ICLR 2022 • Muzammal Naseer, Kanchana Ranasinghe, Salman Khan, Fahad Shahbaz Khan, Fatih Porikli
(ii) Token Refinement: We then propose to refine the tokens to further enhance the discriminative capacity at each block of ViT.
1 code implementation • NeurIPS 2021 • Muzammal Naseer, Kanchana Ranasinghe, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang
We show and analyze the following intriguing properties of ViT: (a) Transformers are highly robust to severe occlusions, perturbations and domain shifts, e. g., retain as high as 60% top-1 accuracy on ImageNet even after randomly occluding 80% of the image content.
1 code implementation • ICCV 2021 • Kanchana Ranasinghe, Muzammal Naseer, Munawar Hayat, Salman Khan, Fahad Shahbaz Khan
The CE loss encourages features of a class to have a higher projection score on the true class-vector compared to the negative classes.
no code implementations • ICLR 2021 • Sameera Ramasinghe, Kanchana Ranasinghe, Salman Khan, Nick Barnes, Stephen Gould
Although deep learning has achieved appealing results on several machine learning tasks, most of the models are deterministic at inference, limiting their application to single-modal settings.
no code implementations • 25 Dec 2019 • Kanchana Ranasinghe, Sahan Liyanaarachchi, Harsha Ranasinghe, Mayuka Jayawardhana
Tracking multiple objects in real time is essential for a variety of real-world applications, with self-driving industry being at the foremost.
1 code implementation • 11 Dec 2019 • Sadeep Jayasumana, Kanchana Ranasinghe, Mayuka Jayawardhana, Sahan Liyanaarachchi, Harsha Ranasinghe
To tackle this problem, we propose a CRF model, named Bipartite CRF or BCRF, with two types of random variables for semantic and instance labels.
no code implementations • 16 Oct 2018 • Sameera Ramasinghe, Jathushan Rajasegaran, Vinoj Jayasundara, Kanchana Ranasinghe, Ranga Rodrigo, Ajith A. Pasqual
We propose three schemas for combining static and motion components: based on a variance ratio, principal components, and Cholesky decomposition.