no code implementations • 30 Oct 2024 • Chinthani Sugandhika, Chen Li, Deepu Rajan, Basura Fernando
Based on our proposed representation, we introduce the task of situational scene graph generation and propose a multi-stage pipeline Interactive and Complementary Network (InComNet) to address the task.
no code implementations • 14 Oct 2024 • Zhengwei Yang, Yuke Li, Qiang Sun, Basura Fernando, Heng Huang, Zheng Wang
Most existing studies on few-shot learning focus on unimodal settings, where models are trained to generalize on unseen data using only a small number of labeled examples from the same modality.
1 code implementation • 30 Jul 2024 • Dhruv Verma, Debaditya Roy, Basura Fernando
We also propose a verb-wise role prediction model with near-perfect accuracy to create an end-to-end framework for producing situational summaries for out-of-domain images.
1 code implementation • 1 Apr 2024 • Paritosh Parmar, Eric Peh, Ruirui Chen, Ting En Lam, Yuhan Chen, Elston Tan, Basura Fernando
To address this gap, we capitalize on the unique properties of cartoons and construct CausalChaos!, a novel, challenging causal Why-QA dataset built upon the iconic "Tom and Jerry" cartoon series.
no code implementations • 23 Jan 2024 • Ee Yeo Keat, Zhang Hao, Alexander Matyasko, Basura Fernando
We introduce VidTFS, a Training-free, open-vocabulary video goal and action inference framework that combines the frozen vision foundational model (VFM) and large language model (LLM) with a novel dynamic Frame Selection module.
no code implementations • 19 Jan 2024 • Paritosh Parmar, Eric Peh, Basura Fernando
We introduce the novel concept of visually Connecting Actions and Their Effects (CATE) in video understanding.
no code implementations • 14 Dec 2023 • Vincent Tao Hu, Wenzhe Yin, Pingchuan Ma, Yunlu Chen, Basura Fernando, Yuki M Asano, Efstratios Gavves, Pascal Mettes, Bjorn Ommer, Cees G. M. Snoek
In this paper, we propose \emph{Motion Flow Matching}, a novel generative model designed for human motion generation featuring efficient sampling and effectiveness in motion editing applications.
1 code implementation • 20 Oct 2023 • Arushi Goel, Basura Fernando, Frank Keller, Hakan Bilen
In this paper, we study multimodal coreference resolution, specifically where a longer descriptive text, i. e., a narration is paired with an image.
1 code implementation • IEEE WACV 2024 • Debaditya Roy, Dhruv Verma, Basura Fernando
Situation Recognition is the task of generating a structured summary of what is happening in an image using an activity verb and the semantic roles played by actors and objects.
Ranked #1 on Situation Recognition on imSitu
no code implementations • 15 Jun 2023 • Ishaan Singh Rawal, Alexander Matyasko, Shantanu Jaiswal, Basura Fernando, Cheston Tan
Consistent with the findings of QUAG, we find that most of the models achieve near-trivial performance on CLAVI.
no code implementations • 4 May 2023 • Ramanathan Rajendiran, Debaditya Roy, Basura Fernando
The final context-infused spatio-temporal interaction tokens are used for compositional action recognition.
1 code implementation • 18 Mar 2023 • Hao Zhang, Yeo Keat Ee, Basura Fernando
Adapters are used for fine-tuning CLIP models for downstream tasks and we design a new attention adapter, that directly steers the focus of the attention map with trainable query and key projections of a frozen CLIP model.
Ranked #1 on Visual Abductive Reasoning on SHERLOCK
no code implementations • ICCV 2023 • Samitha Herath, Basura Fernando, Ehsan Abbasnejad, Munawar Hayat, Shahram Khadivi, Mehrtash Harandi, Hamid Rezatofighi, Gholamreza Haffari
EBL can be used to improve the instance selection for a self-training task on the unlabelled target domain, and 2. alignment and normalizing energy scores can learn domain-invariant representations.
no code implementations • ICCV 2023 • Arushi Goel, Basura Fernando, Frank Keller, Hakan Bilen
Coreference resolution aims to identify words and phrases which refer to same entity in a text, a core task in natural language processing.
1 code implementation • IEEE WACV 2024 • Debaditya Roy, Ramanathan Rajendiran, Basura Fernando
On the EK100 evaluation server, InAViT is the top-performing method on the public leaderboard (at the time of submission) where it outperforms the second-best model by 3. 3% on mean-top5 recall.
Ranked #1 on Action Anticipation on EPIC-KITCHENS-100 (test)
no code implementations • 24 Oct 2022 • Clement Tan, Chai Kiat Yeo, Cheston Tan, Basura Fernando
In this paper, we introduce a novel research task known as "abductive action inference" which addresses the question of which actions were executed by a human to reach a specific state shown in a single snapshot.
no code implementations • 12 Sep 2022 • Debaditya Roy, Basura Fernando
It is through the submission of this paper that our method is currently the new state-of-the-art for action anticipation in EK55 and EGTEA Gaze+ https://competitions. codalab. org/competitions/20071#results Code available at https://github. com/debadityaroy/Abstract_Goal
1 code implementation • 23 Aug 2022 • Kian Boon Koh, Basura Fernando
Collection of real world annotations for training semantic segmentation models is an expensive process.
no code implementations • 31 Mar 2022 • Yunlu Chen, Basura Fernando, Hakan Bilen, Matthias Nießner, Efstratios Gavves
In this work, we address two key limitations of such representations, in failing to capture local 3D geometric fine details, and to learn from and generalize to shapes with unseen 3D transformations.
no code implementations • 19 Dec 2021 • Cristian Rodriguez-Opazo, Edison Marrese-Taylor, Basura Fernando, Hiroya Takamura, Qi Wu
We propose LocFormer, a Transformer-based model for video grounding which operates at a constant memory footprint regardless of the video length, i. e. number of frames.
1 code implementation • 26 Nov 2021 • Shantanu Jaiswal, Basura Fernando, Cheston Tan
Attention modules for Convolutional Neural Networks (CNNs) are an effective method to enhance performance on multiple computer-vision tasks.
no code implementations • CVPR 2022 • Arushi Goel, Basura Fernando, Frank Keller, Hakan Bilen
Scene graph generation (SGG) aims to capture a wide variety of interactions between pairs of objects, which is essential for full scene understanding.
no code implementations • 19 Jul 2021 • Yan Bin Ng, Basura Fernando
A temporal recurrent encoder captures temporal information of input videos while a self-attention model is used to attend on relevant feature dimensions of the input space.
no code implementations • CVPR 2021 • Basura Fernando, Samitha Herath
We propose a framework for early action recognition and anticipation by correlating past features with the future using three novel similarity measures called Jaccard vector similarity, Jaccard cross-correlation and Jaccard Frobenius inner product over covariances.
no code implementations • 11 Dec 2020 • Haziq Razali, Basura Fernando
In this paper, we introduce a new variational model that extends the recurrent network in two ways for the task of video frame prediction.
no code implementations • 8 Nov 2020 • Vinoj Jayasundara, Debaditya Roy, Basura Fernando
Capsule networks (CapsNets) have recently shown promise to excel in most computer vision tasks, especially pertaining to scene understanding.
1 code implementation • 13 Oct 2020 • Cristian Rodriguez-Opazo, Edison Marrese-Taylor, Basura Fernando, Hongdong Li, Stephen Gould
This paper studies the task of temporal moment localization in a long untrimmed video using natural language query.
no code implementations • 28 Apr 2020 • Rodrigo Santa Cruz, Anoop Cherian, Basura Fernando, Dylan Campbell, Stephen Gould
This paper presents a framework to recognize temporal compositions of atomic actions in videos.
no code implementations • 10 Dec 2019 • Yan Bin Ng, Basura Fernando
We extend our action sequence forecasting model to perform weakly supervised action forecasting on two challenging datasets, the Breakfast and the 50Salads.
no code implementations • 22 Nov 2019 • Arushi Goel, Basura Fernando, Thanh-Son Nguyen, Hakan Bilen
Automatically generating natural language descriptions from an image is a challenging problem in artificial intelligence that requires a good understanding of the visual and textual signals and the correlations between them.
no code implementations • ECCV 2018 • Yuge Shi, Basura Fernando, Richard Hartley
We introduce a novel Recurrent Neural Network-based algorithm for future video feature generation and action anticipation called feature mapping RNN.
no code implementations • 7 Oct 2019 • Yan Bin Ng, Basura Fernando
Furthermore, we use our model that is trained to output action sequences to solve downstream tasks; such as video captioning and action localization.
no code implementations • 25 Sep 2019 • Basura Fernando, Hakan Bilen
The instance representation is shared by both instance classification and weighting streams.
no code implementations • 16 Apr 2019 • Basura Fernando, Cheston Tan Yin Chet, Hakan Bilen
Detecting temporal extents of human actions in videos is a challenging computer vision problem that requires detailed manual supervision including frame-level labels.
no code implementations • 22 Oct 2018 • Mohammad Sadegh Aliakbarian, Fatemeh Sadat Saleh, Mathieu Salzmann, Basura Fernando, Lars Petersson, Lars Andersson
Action anticipation is critical in scenarios where one needs to react before the action is finalized.
no code implementations • ECCV 2018 • Xin Yu, Basura Fernando, Bernard Ghanem, Fatih Porikli, Richard Hartley
State-of-the-art face super-resolution methods use deep convolutional neural networks to learn a mapping between low-resolution (LR) facial patterns and their corresponding high-resolution (HR) counterparts by exploring local information.
no code implementations • 1 Aug 2018 • Cristian Rodriguez, Basura Fernando, Hongdong Li
Human action-anticipation methods predict what is the future action by observing only a few portion of an action in progress.
no code implementations • CVPR 2018 • Xin Yu, Basura Fernando, Richard Hartley, Fatih Porikli
An LR input contains low-frequency facial components of its HR version while its residual face image defined as the difference between the HR ground-truth and interpolated LR images contains the missing high-frequency facial details.
no code implementations • 26 Jan 2018 • Rodrigo Santa Cruz, Basura Fernando, Anoop Cherian, Stephen Gould
In this paper, we build on the compositionality principle and develop an "algebra" to compose classifiers for complex visual concepts.
1 code implementation • 30 May 2017 • Basura Fernando, Stephen Gould
First, we present "discriminative rank pooling" in which the shared weights of our video representation and the parameters of the action classifiers are estimated jointly for a given training dataset of labelled vector sequences using a bilevel optimization formulation of the learning problem.
no code implementations • CVPR 2017 • Rodrigo Santa Cruz, Basura Fernando, Anoop Cherian, Stephen Gould
Unrolling these iterations in a Sinkhorn network layer, we propose DeepPermNet, an end-to-end CNN model for this task.
no code implementations • CVPR 2017 • Anoop Cherian, Basura Fernando, Mehrtash Harandi, Stephen Gould
Most popular deep models for action recognition split video sequences into short sub-sequences consisting of a few frames; frame-based features are then pooled for recognizing the activity.
1 code implementation • ICCV 2017 • Mohammad Sadegh Aliakbarian, Fatemeh Sadat Saleh, Mathieu Salzmann, Basura Fernando, Lars Petersson, Lars Andersson
In contrast to the widely studied problem of recognizing an action given a complete sequence, action anticipation aims to identify the action from only partially available videos.
no code implementations • 7 Mar 2017 • Sajib Kumar Saha, Basura Fernando, Jorge Cuadros, Di Xiao, Yogesan Kanagasingam
Three retinal image analysis experts were employed to categorize these images into Accept and Reject classes based on the precise definition of image quality in the context of DR. A deep learning framework was trained using 3428 images.
1 code implementation • EMNLP 2017 • Peter Anderson, Basura Fernando, Mark Johnson, Stephen Gould
Existing image captioning models do not generalize well to out-of-domain images containing novel scenes or objects.
3 code implementations • 2 Dec 2016 • Hakan Bilen, Basura Fernando, Efstratios Gavves, Andrea Vedaldi
This is a powerful idea because it allows to convert any video to an image so that existing CNN models pre-trained for the analysis of still images can be immediately extended to videos.
no code implementations • 2 Dec 2016 • Basura Fernando, Sareh Shirazi, Stephen Gould
On the MPII Cooking dataset we detect action segments with a precision of 21. 6% and recall of 11. 7% over 946 long video pairs and over 5000 ground truth action segments.
no code implementations • CVPR 2017 • Basura Fernando, Hakan Bilen, Efstratios Gavves, Stephen Gould
On action classification, our method obtains 60. 3\% on the UCF101 dataset using only UCF101 data for training which is approximately 10% better than current state-of-the-art self-supervised learning methods.
Ranked #47 on Self-Supervised Action Recognition on UCF101
no code implementations • 17 Nov 2016 • Mohammad Sadegh Aliakbarian, Fatemehsadat Saleh, Basura Fernando, Mathieu Salzmann, Lars Petersson, Lars Andersson
We outperform the state-of-the-art methods that, as us, rely only on RGB frames as input for both action recognition and anticipation.
no code implementations • 17 Nov 2016 • Mehrtash Harandi, Basura Fernando
This paper introduces an extension of the backpropagation algorithm that enables us to have layers with constrained weights in a deep network.
Dimensionality Reduction Fine-Grained Image Classification +1
11 code implementations • 29 Jul 2016 • Peter Anderson, Basura Fernando, Mark Johnson, Stephen Gould
There is considerable interest in the task of automatically generating image captions.
no code implementations • 19 Jul 2016 • Stephen Gould, Basura Fernando, Anoop Cherian, Peter Anderson, Rodrigo Santa Cruz, Edison Guo
Some recent works in machine learning and computer vision involve the solution of a bi-level optimization problem.
1 code implementation • CVPR 2016 • Hakan Bilen, Basura Fernando, Efstratios Gavves, Andrea Vedaldi, Stephen Gould
We introduce the concept of dynamic image, a novel compact representation of videos useful for video analysis especially when convolutional neural networks (CNNs) are used.
Ranked #62 on Action Recognition on HMDB-51
no code implementations • CVPR 2016 • Basura Fernando, Peter Anderson, Marcus Hutter, Stephen Gould
We present hierarchical rank pooling, a video sequence encoding method for activity recognition.
1 code implementation • 6 Dec 2015 • Basura Fernando, Efstratios Gavves, Jose Oramas, Amir Ghodrati, Tinne Tuytelaars
We show how the parameters of a function that has been fit to the video data can serve as a robust new video representation.
no code implementations • ICCV 2015 • Xu Jia, Efstratios Gavves, Basura Fernando, Tinne Tuytelaars
In this work we focus on the problem of image caption generation.
no code implementations • ICCV 2015 • Basura Fernando, Efstratios Gavves, Damien Muselet, Tinne Tuytelaars
We present a supervised learning to rank algorithm that effectively orders images by exploiting the structure in image sequences.
no code implementations • 29 Nov 2015 • Basura Fernando, Efstratios Gavves, Damien Muselet, Tinne Tuytelaars
We present a supervised learning to rank algorithm that effectively orders images by exploiting the structure in image sequences.
1 code implementation • 16 Sep 2015 • Xu Jia, Efstratios Gavves, Basura Fernando, Tinne Tuytelaars
In this work we focus on the problem of image caption generation.
no code implementations • CVPR 2015 • Konstantinos Rematas, Basura Fernando, Frank Dellaert, Tinne Tuytelaars
As the amount of visual data increases, so does the need for summarization tools that can be used to explore large image collections and to quickly get familiar with their content.
no code implementations • CVPR 2015 • Basura Fernando, Efstratios Gavves, Jose Oramas M., Amir Ghodrati, Tinne Tuytelaars
We postulate that a function capable of ordering the frames of a video temporally (based on the appearance) captures well the evolution of the appearance within the video.
no code implementations • 3 May 2015 • Basura Fernando, Sezer Karaoglu, Sajib Kumar Saha
This paper presents a novel multi scale gradient and a corner point based shape descriptors.
no code implementations • 17 Nov 2014 • Basura Fernando, Tatiana Tommasi, Tinne Tuytelaars
Domain adaptation aims at adapting the knowledge acquired on a source domain to a new different but related target domain.
no code implementations • 26 Sep 2014 • Basura Fernando, Tatiana Tommasi, Tinne Tuytelaars
Would it be possible to automatically associate ancient pictures to modern ones and create fancy cultural heritage city maps?
no code implementations • 18 Sep 2014 • Basura Fernando, Amaury Habrard, Marc Sebban, Tinne Tuytelaars
We present two approaches to determine the only hyper-parameter in our method corresponding to the size of the subspaces.