no code implementations • 20 Nov 2024 • Xinyue Hao, Gen Li, Shreyank N Gowda, Robert B Fisher, Jonathan Huang, Anurag Arnab, Laura Sevilla-Lara
First, we develop an oracle for the value of tokens which exposes a clear Pareto distribution where most tokens have remarkably low value, and just a few carry most of the perceptual information.
no code implementations • 14 Oct 2024 • Shreyank N Gowda, Davide Moltisanti, Laura Sevilla-Lara
In this paper, we propose a novel method based on continual learning to address zero-shot action recognition.
no code implementations • 19 Aug 2024 • Gen Li, Nikolaos Tsagkas, Jifei Song, Ruaridh Mon-Williams, Sethu Vijayakumar, Kun Shao, Laura Sevilla-Lara
In this paper, we present a streamlined affordance learning system that encompasses data collection, effective model training, and robot deployment.
1 code implementation • 13 May 2024 • Davide Moltisanti, Hakan Bilen, Laura Sevilla-Lara, Frank Keller
We use our synthetic data to train a model based on UNet and test it on real images showing coarsely/finely cut objects.
no code implementations • CVPR 2024 • Gen Li, Deqing Sun, Laura Sevilla-Lara, Varun Jampani
We introduce One-shot Open Affordance Learning (OOAL), where a model is trained with just one example per base object category, but is expected to identify novel objects and affordances.
1 code implementation • 27 Nov 2023 • Anil Batra, Davide Moltisanti, Laura Sevilla-Lara, Marcus Rohrbach, Frank Keller
To mitigate these issues, we propose a novel technique, Sieve-&-Swap, to automatically generate high-quality training data for the recipe domain: (i) Sieve: filters irrelevant transcripts and (ii) Swap: acquires high-quality text by replacing transcripts with human-written instruction from a text-only recipe dataset.
no code implementations • 10 Oct 2023 • Shreyank N Gowda, Xinyue Hao, Gen Li, Shashank Narayana Gowda, Xiaobo Jin, Laura Sevilla-Lara
Deep learning models have revolutionized various fields, from image recognition to natural language processing, by achieving unprecedented levels of accuracy.
1 code implementation • 29 Sep 2023 • Shreyank N Gowda, Laura Sevilla-Lara
The textual narratives forge connections between seen and unseen classes, overcoming the bottleneck of labeled data that has long impeded advancements in this exciting domain.
1 code implementation • CVPR 2023 • Davide Moltisanti, Frank Keller, Hakan Bilen, Laura Sevilla-Lara
The goal of this work is to understand the way actions are performed in videos.
Ranked #2 on
Video-Adverb Retrieval
on HowTo100M Adverbs
1 code implementation • CVPR 2023 • Gen Li, Varun Jampani, Deqing Sun, Laura Sevilla-Lara
A key step to acquire this skill is to identify what part of the object affords each action, which is called affordance grounding.
2 code implementations • 10 Oct 2022 • Kiyoon Kim, Davide Moltisanti, Oisin Mac Aodha, Laura Sevilla-Lara
In practice, a given video can contain multiple valid positive annotations for the same action.
no code implementations • 30 Sep 2022 • Anil Batra, Shreyank N Gowda, Frank Keller, Laura Sevilla-Lara
We refer to this task as Procedure Segmentation and Summarization (PSS).
no code implementations • 9 Jun 2022 • Shreyank N Gowda, Marcus Rohrbach, Frank Keller, Laura Sevilla-Lara
We propose to learn what makes a good video for action recognition and select only high-quality samples for augmentation.
Ranked #2 on
Few Shot Action Recognition
on HMDB51
1 code implementation • 25 Jan 2022 • Kiyoon Kim, Shreyank N Gowda, Oisin Mac Aodha, Laura Sevilla-Lara
We address the problem of capturing temporal information for video classification in 2D networks, without increasing their computational cost.
1 code implementation • 27 Jul 2021 • Shreyank N Gowda, Laura Sevilla-Lara, Kiyoon Kim, Frank Keller, Marcus Rohrbach
We benchmark several recent approaches on the proposed True Zero-Shot(TruZe) Split for UCF101 and HMDB51, with zero-shot and generalized zero-shot evaluation.
2 code implementations • CVPR 2021 • Gen Li, Varun Jampani, Laura Sevilla-Lara, Deqing Sun, Jonghyun Kim, Joongkyu Kim
By integrating the SGC and GPA together, we propose the Adaptive Superpixel-guided Network (ASGNet), which is a lightweight model and adapts to object scale and shape variation.
Ranked #67 on
Few-Shot Semantic Segmentation
on COCO-20i (5-shot)
no code implementations • 18 Jan 2021 • Shreyank N Gowda, Laura Sevilla-Lara, Frank Keller, Marcus Rohrbach
Theproblem can be seen as learning a function which general-izes well to instances of unseen classes without losing dis-crimination between classes.
Ranked #2 on
Zero-Shot Action Recognition
on Olympics
no code implementations • 19 Dec 2020 • Shreyank N Gowda, Marcus Rohrbach, Laura Sevilla-Lara
In this work, however, we focus on the more standard short, trimmed action recognition problem.
Ranked #6 on
Action Recognition
on UCF101
1 code implementation • 26 May 2020 • Shreyank N Gowda, Panagiotis Eustratiadis, Timothy Hospedales, Laura Sevilla-Lara
We treat this as a grouping problem by exploiting object proposals and making a joint inference about grouping over both space and time.
no code implementations • 23 Apr 2020 • Yannis Kalantidis, Laura Sevilla-Lara, Ernest Mwebaze, Dina Machuve, Hamed Alemohammad, David Guerena
The workshop was held in conjunction with the International Conference on Learning Representations (ICLR) 2020.
no code implementations • 19 Jul 2019 • Laura Sevilla-Lara, Shengxin Zha, Zhicheng Yan, Vedanuj Goswami, Matt Feiszli, Lorenzo Torresani
However, in current video datasets it has been observed that action classes can often be recognized without any temporal information from a single frame of video.
no code implementations • 10 Jun 2019 • Linchao Zhu, Laura Sevilla-Lara, Du Tran, Matt Feiszli, Yi Yang, Heng Wang
FASTER aims to leverage the redundancy between neighboring clips and reduce the computational cost by learning to aggregate the predictions from models of different complexities.
Ranked #28 on
Action Recognition
on UCF101
no code implementations • CVPR 2019 • Zheng Shou, Xudong Lin, Yannis Kalantidis, Laura Sevilla-Lara, Marcus Rohrbach, Shih-Fu Chang, Zhicheng Yan
Motion has shown to be useful for video understanding, where motion is typically represented by optical flow.
Ranked #1 on
Action Recognition
on UCF-101
no code implementations • 22 Dec 2017 • Laura Sevilla-Lara, Yiyi Liao, Fatma Guney, Varun Jampani, Andreas Geiger, Michael J. Black
Here we take a deeper look at the combination of flow and action recognition, and investigate why optical flow is helpful, what makes a flow method good for action recognition, and how we can make it better.
no code implementations • CVPR 2017 • Jonas Wulff, Laura Sevilla-Lara, Michael J. Black
Existing algorithms typically focus on either recovering motion and structure under the assumption of a purely static world or optical flow for general unconstrained scenes.
Ranked #13 on
Optical Flow Estimation
on Sintel-clean
no code implementations • CVPR 2016 • Laura Sevilla-Lara, Deqing Sun, Varun Jampani, Michael J. Black
Existing optical flow methods make generic, spatially homogeneous, assumptions about the spatial structure of the flow.