Search Results for author: Kanchana Ranasinghe

Found 20 papers, 13 papers with code

Pixel Motion as Universal Representation for Robot Control

no code implementations12 May 2025 Kanchana Ranasinghe, Xiang Li, Cristina Mata, Jongwoo Park, Michael S Ryoo

We present LangToMo, a vision-language-action framework structured as a dual-system architecture that uses pixel motion forecasts as intermediate representations.

Vision-Language-Action

LatentCRF: Continuous CRF for Efficient Latent Diffusion

no code implementations24 Dec 2024 Kanchana Ranasinghe, Sadeep Jayasumana, Andreas Veit, Ayan Chakrabarti, Daniel Glasner, Michael S Ryoo, Srikumar Ramalingam, Sanjiv Kumar

Latent Diffusion Models (LDMs) produce high-quality, photo-realistic images, however, the latency incurred by multiple costly inference iterations can restrict their applicability.

Diversity

LLaRA: Supercharging Robot Learning Data for Vision-Language Policy

1 code implementation28 Jun 2024 Xiang Li, Cristina Mata, Jongwoo Park, Kumara Kahatapitiya, Yoo Sung Jang, Jinghuan Shang, Kanchana Ranasinghe, Ryan Burgert, Mu Cai, Yong Jae Lee, Michael S. Ryoo

In this work, we introduce LLaRA: Large Language and Robotics Assistant, a framework that formulates robot action policy as visuo-textual conversations and enables an efficient transfer of a pretrained VLM into a powerful VLA, motivated by the success of visual instruction tuning in Computer Vision.

Vision-Language-Action World Knowledge

Too Many Frames, Not All Useful: Efficient Strategies for Long-Form Video QA

1 code implementation13 Jun 2024 Jongwoo Park, Kanchana Ranasinghe, Kumara Kahatapitiya, Wonjeong Ryoo, Donghyun Kim, Michael S. Ryoo

Long-form videos that span across wide temporal intervals are highly information redundant and contain multiple distinct events or entities that are often loosely related.

All EgoSchema +2

Learning to Localize Objects Improves Spatial Reasoning in Visual-LLMs

1 code implementation CVPR 2024 Kanchana Ranasinghe, Satya Narayan Shukla, Omid Poursaeed, Michael S. Ryoo, Tsung-Yu Lin

Integration of Large Language Models (LLMs) into visual domain tasks, resulting in visual-LLMs (V-LLMs), has enabled exceptional performance in vision-language tasks, particularly for visual question answering (VQA).

Descriptive Hallucination +4

Understanding Long Videos with Multimodal Language Models

1 code implementation25 Mar 2024 Kanchana Ranasinghe, Xiang Li, Kumara Kahatapitiya, Michael S. Ryoo

Our resulting Multimodal Video Understanding (MVU) framework demonstrates state-of-the-art performance across multiple video understanding benchmarks.

Fine-grained Action Recognition Language Modelling +5

Language Repository for Long Video Understanding

1 code implementation21 Mar 2024 Kumara Kahatapitiya, Kanchana Ranasinghe, Jongwoo Park, Michael S. Ryoo

In this paper, we introduce a Language Repository (LangRepo) for LLMs, that maintains concise and structured information as an interpretable (i. e., all-textual) representation.

EgoSchema Video Understanding +2

Hierarchical Text-to-Vision Self Supervised Alignment for Improved Histopathology Representation Learning

1 code implementation21 Mar 2024 Hasindri Watawana, Kanchana Ranasinghe, Tariq Mahmood, Muzammal Naseer, Salman Khan, Fahad Shahbaz Khan

Self-supervised representation learning has been highly promising for histopathology image analysis with numerous approaches leveraging their patient-slide-patch hierarchy to learn better representations.

Representation Learning Self-Supervised Learning

Diffusion Illusions: Hiding Images in Plain Sight

no code implementations6 Dec 2023 Ryan Burgert, Xiang Li, Abe Leite, Kanchana Ranasinghe, Michael S. Ryoo

We explore the problem of computationally generating special `prime' images that produce optical illusions when physically arranged and viewed in a certain way.

Peekaboo: Text to Image Diffusion Models are Zero-Shot Segmentors

1 code implementation23 Nov 2022 Ryan Burgert, Kanchana Ranasinghe, Xiang Li, Michael S. Ryoo

In this work, we explore how an off-the-shelf text-to-image diffusion model, trained without exposure to localization information, can ground various semantic phrases without segmentation-specific re-training.

Segmentation Unsupervised Semantic Segmentation

Self-supervised Video Transformer

1 code implementation CVPR 2022 Kanchana Ranasinghe, Muzammal Naseer, Salman Khan, Fahad Shahbaz Khan, Michael Ryoo

To the best of our knowledge, the proposed approach is the first to alleviate the dependency on negative samples or dedicated memory banks in Self-supervised Video Transformer (SVT).

Action Classification Action Recognition In Videos +1

On Improving Adversarial Transferability of Vision Transformers

3 code implementations ICLR 2022 Muzammal Naseer, Kanchana Ranasinghe, Salman Khan, Fahad Shahbaz Khan, Fatih Porikli

(ii) Token Refinement: We then propose to refine the tokens to further enhance the discriminative capacity at each block of ViT.

Adversarial Attack

Intriguing Properties of Vision Transformers

1 code implementation NeurIPS 2021 Muzammal Naseer, Kanchana Ranasinghe, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang

We show and analyze the following intriguing properties of ViT: (a) Transformers are highly robust to severe occlusions, perturbations and domain shifts, e. g., retain as high as 60% top-1 accuracy on ImageNet even after randomly occluding 80% of the image content.

Few-Shot Learning Semantic Segmentation

Orthogonal Projection Loss

1 code implementation ICCV 2021 Kanchana Ranasinghe, Muzammal Naseer, Munawar Hayat, Salman Khan, Fahad Shahbaz Khan

The CE loss encourages features of a class to have a higher projection score on the true class-vector compared to the negative classes.

Domain Generalization Few-Shot Learning

Conditional Generative Modeling via Learning the Latent Space

no code implementations ICLR 2021 Sameera Ramasinghe, Kanchana Ranasinghe, Salman Khan, Nick Barnes, Stephen Gould

Although deep learning has achieved appealing results on several machine learning tasks, most of the models are deterministic at inference, limiting their application to single-modal settings.

Extending Multi-Object Tracking systems to better exploit appearance and 3D information

no code implementations25 Dec 2019 Kanchana Ranasinghe, Sahan Liyanaarachchi, Harsha Ranasinghe, Mayuka Jayawardhana

Tracking multiple objects in real time is essential for a variety of real-world applications, with self-driving industry being at the foremost.

Object Real-Time Multi-Object Tracking

Bipartite Conditional Random Fields for Panoptic Segmentation

1 code implementation11 Dec 2019 Sadeep Jayasumana, Kanchana Ranasinghe, Mayuka Jayawardhana, Sahan Liyanaarachchi, Harsha Ranasinghe

To tackle this problem, we propose a CRF model, named Bipartite CRF or BCRF, with two types of random variables for semantic and instance labels.

Panoptic Segmentation Segmentation

Cannot find the paper you are looking for? You can Submit a new open access paper.