Search Results for author: Kanchana Ranasinghe

Found 15 papers, 10 papers with code

Learning to Localize Objects Improves Spatial Reasoning in Visual-LLMs

no code implementations • 11 Apr 2024 • Kanchana Ranasinghe, Satya Narayan Shukla, Omid Poursaeed, Michael S. Ryoo, Tsung-Yu Lin

Integration of Large Language Models (LLMs) into visual domain tasks, resulting in visual-LLMs (V-LLMs), has enabled exceptional performance in vision-language tasks, particularly for visual question answering (VQA).

Descriptive Hallucination +2

Paper
Add Code

Understanding Long Videos in One Multimodal Language Model Pass

1 code implementation • 25 Mar 2024 • Kanchana Ranasinghe, Xiang Li, Kumara Kahatapitiya, Michael S. Ryoo

In addition to faster inference, we discover the resulting models to yield surprisingly good accuracy on long-video tasks, even with no video specific information.

Fine-grained Action Recognition Language Modelling +3

Paper
Code

Language Repository for Long Video Understanding

1 code implementation • 21 Mar 2024 • Kumara Kahatapitiya, Kanchana Ranasinghe, Jongwoo Park, Michael S. Ryoo

In this paper, we introduce a Language Repository (LangRepo) for LLMs, that maintains concise and structured information as an interpretable (i. e., all-textual) representation.

Ranked #1 on Zero-Shot Video Question Answer on EgoSchema (subset)

Video Understanding Visual Question Answering +1

Paper
Code

Hierarchical Text-to-Vision Self Supervised Alignment for Improved Histopathology Representation Learning

1 code implementation • 21 Mar 2024 • Hasindri Watawana, Kanchana Ranasinghe, Tariq Mahmood, Muzammal Naseer, Salman Khan, Fahad Shahbaz Khan

Self-supervised representation learning has been highly promising for histopathology image analysis with numerous approaches leveraging their patient-slide-patch hierarchy to learn better representations.

Representation Learning Self-Supervised Learning

Paper
Code

Diffusion Illusions: Hiding Images in Plain Sight

no code implementations • 6 Dec 2023 • Ryan Burgert, Xiang Li, Abe Leite, Kanchana Ranasinghe, Michael S. Ryoo

We explore the problem of computationally generating special `prime' images that produce optical illusions when physically arranged and viewed in a certain way.

Paper
Add Code

Peekaboo: Text to Image Diffusion Models are Zero-Shot Segmentors

1 code implementation • 23 Nov 2022 • Ryan Burgert, Kanchana Ranasinghe, Xiang Li, Michael S. Ryoo

In this work, we explore how an off-the-shelf text-to-image diffusion model, trained without exposure to localization information, can ground various semantic phrases without segmentation-specific re-training.

Segmentation Unsupervised Semantic Segmentation

Paper
Code

Perceptual Grouping in Contrastive Vision-Language Models

1 code implementation • ICCV 2023 • Kanchana Ranasinghe, Brandon McKinzie, Sachin Ravi, Yinfei Yang, Alexander Toshev, Jonathon Shlens

In this work we examine how well vision-language models are able to understand where objects reside within an image and group together visually related parts of the imagery.

Ranked #1 on Unsupervised Semantic Segmentation with Language-image Pre-training on MS COCO

Object Localization Representation Learning +1

Paper
Code

Self-supervised Video Transformer

1 code implementation • CVPR 2022 • Kanchana Ranasinghe, Muzammal Naseer, Salman Khan, Fahad Shahbaz Khan, Michael Ryoo

To the best of our knowledge, the proposed approach is the first to alleviate the dependency on negative samples or dedicated memory banks in Self-supervised Video Transformer (SVT).

Ranked #55 on Action Recognition on UCF101

Action Classification Action Recognition In Videos

Paper
Code

On Improving Adversarial Transferability of Vision Transformers

3 code implementations • ICLR 2022 • Muzammal Naseer, Kanchana Ranasinghe, Salman Khan, Fahad Shahbaz Khan, Fatih Porikli

(ii) Token Refinement: We then propose to refine the tokens to further enhance the discriminative capacity at each block of ViT.

Adversarial Attack

129

Paper
Code

Intriguing Properties of Vision Transformers

1 code implementation • NeurIPS 2021 • Muzammal Naseer, Kanchana Ranasinghe, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang

We show and analyze the following intriguing properties of ViT: (a) Transformers are highly robust to severe occlusions, perturbations and domain shifts, e. g., retain as high as 60% top-1 accuracy on ImageNet even after randomly occluding 80% of the image content.

Few-Shot Learning Semantic Segmentation

173

Paper
Code

Orthogonal Projection Loss

1 code implementation • ICCV 2021 • Kanchana Ranasinghe, Muzammal Naseer, Munawar Hayat, Salman Khan, Fahad Shahbaz Khan

The CE loss encourages features of a class to have a higher projection score on the true class-vector compared to the negative classes.

Domain Generalization Few-Shot Learning

109

Paper
Code

Conditional Generative Modeling via Learning the Latent Space

no code implementations • ICLR 2021 • Sameera Ramasinghe, Kanchana Ranasinghe, Salman Khan, Nick Barnes, Stephen Gould

Although deep learning has achieved appealing results on several machine learning tasks, most of the models are deterministic at inference, limiting their application to single-modal settings.

Paper
Add Code

Extending Multi-Object Tracking systems to better exploit appearance and 3D information

no code implementations • 25 Dec 2019 • Kanchana Ranasinghe, Sahan Liyanaarachchi, Harsha Ranasinghe, Mayuka Jayawardhana

Tracking multiple objects in real time is essential for a variety of real-world applications, with self-driving industry being at the foremost.

Object Real-Time Multi-Object Tracking

Paper
Add Code

Bipartite Conditional Random Fields for Panoptic Segmentation

1 code implementation • 11 Dec 2019 • Sadeep Jayasumana, Kanchana Ranasinghe, Mayuka Jayawardhana, Sahan Liyanaarachchi, Harsha Ranasinghe

To tackle this problem, we propose a CRF model, named Bipartite CRF or BCRF, with two types of random variables for semantic and instance labels.

Panoptic Segmentation Segmentation

Paper
Code

Combined Static and Motion Features for Deep-Networks Based Activity Recognition in Videos

no code implementations • 16 Oct 2018 • Sameera Ramasinghe, Jathushan Rajasegaran, Vinoj Jayasundara, Kanchana Ranasinghe, Ranga Rodrigo, Ajith A. Pasqual

We propose three schemas for combining static and motion components: based on a variance ratio, principal components, and Cholesky decomposition.

Activity Recognition Activity Recognition In Videos

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.