Search Results for author: Kumara Kahatapitiya

Found 15 papers, 10 papers with code

Understanding Long Videos in One Multimodal Language Model Pass

1 code implementation25 Mar 2024 Kanchana Ranasinghe, Xiang Li, Kumara Kahatapitiya, Michael S. Ryoo

In addition to faster inference, we discover the resulting models to yield surprisingly good accuracy on long-video tasks, even with no video specific information.

Fine-grained Action Recognition Language Modelling +3

Language Repository for Long Video Understanding

1 code implementation21 Mar 2024 Kumara Kahatapitiya, Kanchana Ranasinghe, Jongwoo Park, Michael S. Ryoo

In this paper, we introduce a Language Repository (LangRepo) for LLMs, that maintains concise and structured information as an interpretable (i. e., all-textual) representation.

Video Understanding Visual Question Answering +1

Object-Centric Diffusion for Efficient Video Editing

no code implementations11 Jan 2024 Kumara Kahatapitiya, Adil Karjauv, Davide Abati, Fatih Porikli, Yuki M. Asano, Amirhossein Habibian

Diffusion-based video editing have reached impressive quality and can transform either the global style, local structure, and attributes of given video inputs, following textual edit prompts.

Object Video Editing

VicTR: Video-conditioned Text Representations for Activity Recognition

no code implementations5 Apr 2023 Kumara Kahatapitiya, Anurag Arnab, Arsha Nagrani, Michael S. Ryoo

All such recipes rely on augmenting visual embeddings with temporal information (i. e., image -> video), often keeping text embeddings unchanged or even being discarded.

Action Classification Activity Recognition +1

Token Turing Machines

1 code implementation CVPR 2023 Michael S. Ryoo, Keerthana Gopalakrishnan, Kumara Kahatapitiya, Ted Xiao, Kanishka Rao, Austin Stone, Yao Lu, Julian Ibarz, Anurag Arnab

The model's memory module ensures that a new observation will only be processed with the contents of the memory (and not the entire history), meaning that it can efficiently process long sequences with a bounded computational cost at each step.

Action Detection Activity Detection

Grafting Vision Transformers

no code implementations28 Oct 2022 Jongwoo Park, Kumara Kahatapitiya, Donghyun Kim, Shivchander Sudalairaj, Quanfu Fan, Michael S. Ryoo

In this paper, we present a simple and efficient add-on component (termed GrafT) that considers global dependencies and multi-scale information throughout the network, in both high- and low-resolution features alike.

Image Classification Instance Segmentation +3

Weakly-guided Self-supervised Pretraining for Temporal Activity Detection

1 code implementation26 Nov 2021 Kumara Kahatapitiya, Zhou Ren, Haoxiang Li, Zhenyu Wu, Michael S. Ryoo, Gang Hua

However, such pretrained models are not ideal for downstream detection, due to the disparity between the pretraining and the downstream fine-tuning tasks.

Action Detection Activity Detection +2

SWAT: Spatial Structure Within and Among Tokens

1 code implementation26 Nov 2021 Kumara Kahatapitiya, Michael S. Ryoo

Modeling visual data as tokens (i. e., image patches) using attention mechanisms, feed-forward networks or convolutions has been highly effective in recent years.

StARformer: Transformer with State-Action-Reward Representations for Visual Reinforcement Learning

1 code implementation12 Oct 2021 Jinghuan Shang, Kumara Kahatapitiya, Xiang Li, Michael S. Ryoo

Reinforcement Learning (RL) can be considered as a sequence modeling task: given a sequence of past state-action-reward experiences, an agent predicts a sequence of next actions.

Imitation Learning Inductive Bias +3

Coarse-Fine Networks for Temporal Activity Detection in Videos

1 code implementation CVPR 2021 Kumara Kahatapitiya, Michael S. Ryoo

In this paper, we introduce Coarse-Fine Networks, a two-stream architecture which benefits from different abstractions of temporal resolution to learn better video representations for long-term motion.

Action Detection Activity Detection

Feature-Dependent Cross-Connections in Multi-Path Neural Networks

no code implementations24 Jun 2020 Dumindu Tissera, Kasun Vithanage, Rukshan Wijesinghe, Kumara Kahatapitiya, Subha Fernando, Ranga Rodrigo

As opposed to conventional network widening, multi-path architectures restrict the quadratic increment of complexity to a linear scale.

Context-Aware Multipath Networks

no code implementations26 Jul 2019 Dumindu Tissera, Kumara Kahatapitiya, Rukshan Wijesinghe, Subha Fernando, Ranga Rodrigo

In view of this, networks which can allocate resources according to the context of the input and regulate flow of information across the network are effective.

Image Classification

Exploiting the Redundancy in Convolutional Filters for Parameter Reduction

1 code implementation26 Jul 2019 Kumara Kahatapitiya, Ranga Rodrigo

Convolutional Neural Networks (CNNs) have achieved state-of-the-art performance in many computer vision tasks over the years.

Context-Aware Automatic Occlusion Removal

1 code implementation7 May 2019 Kumara Kahatapitiya, Dumindu Tissera, Ranga Rodrigo

Occlusion removal is an interesting application of image enhancement, for which, existing work suggests manually-annotated or domain-specific occlusion removal.

Image Enhancement

Cannot find the paper you are looking for? You can Submit a new open access paper.