no code implementations • 27 Nov 2023 • Anil Batra, Davide Moltisanti, Laura Sevilla-Lara, Marcus Rohrbach, Frank Keller
Understanding such videos is challenging, involving the precise localization of steps and the generation of textual instructions.
no code implementations • 10 Oct 2023 • Shreyank N Gowda, Xinyue Hao, Gen Li, Laura Sevilla-Lara, Shashank Narayana Gowda
Deep learning models have revolutionized various fields, from image recognition to natural language processing, by achieving unprecedented levels of accuracy.
1 code implementation • 29 Sep 2023 • Shreyank N Gowda, Laura Sevilla-Lara
The textual narratives forge connections between seen and unseen classes, overcoming the bottleneck of labeled data that has long impeded advancements in this exciting domain.
1 code implementation • CVPR 2023 • Davide Moltisanti, Frank Keller, Hakan Bilen, Laura Sevilla-Lara
The goal of this work is to understand the way actions are performed in videos.
Ranked #2 on
Video-Adverb Retrieval
on HowTo100M Adverbs
no code implementations • CVPR 2023 • Gen Li, Varun Jampani, Deqing Sun, Laura Sevilla-Lara
A key step to acquire this skill is to identify what part of the object affords each action, which is called affordance grounding.
2 code implementations • 10 Oct 2022 • Kiyoon Kim, Davide Moltisanti, Oisin Mac Aodha, Laura Sevilla-Lara
In practice, a given video can contain multiple valid positive annotations for the same action.
no code implementations • 30 Sep 2022 • Anil Batra, Shreyank N Gowda, Frank Keller, Laura Sevilla-Lara
We refer to this task as Procedure Segmentation and Summarization (PSS).
no code implementations • 9 Jun 2022 • Shreyank N Gowda, Marcus Rohrbach, Frank Keller, Laura Sevilla-Lara
We propose to learn what makes a good video for action recognition and select only high-quality samples for augmentation.
Ranked #2 on
Few Shot Action Recognition
on HMDB51
1 code implementation • 25 Jan 2022 • Kiyoon Kim, Shreyank N Gowda, Oisin Mac Aodha, Laura Sevilla-Lara
We address the problem of capturing temporal information for video classification in 2D networks, without increasing their computational cost.
1 code implementation • 27 Jul 2021 • Shreyank N Gowda, Laura Sevilla-Lara, Kiyoon Kim, Frank Keller, Marcus Rohrbach
We benchmark several recent approaches on the proposed True Zero-Shot(TruZe) Split for UCF101 and HMDB51, with zero-shot and generalized zero-shot evaluation.
2 code implementations • CVPR 2021 • Gen Li, Varun Jampani, Laura Sevilla-Lara, Deqing Sun, Jonghyun Kim, Joongkyu Kim
By integrating the SGC and GPA together, we propose the Adaptive Superpixel-guided Network (ASGNet), which is a lightweight model and adapts to object scale and shape variation.
Ranked #55 on
Few-Shot Semantic Segmentation
on COCO-20i (5-shot)
no code implementations • 18 Jan 2021 • Shreyank N Gowda, Laura Sevilla-Lara, Frank Keller, Marcus Rohrbach
Theproblem can be seen as learning a function which general-izes well to instances of unseen classes without losing dis-crimination between classes.
Ranked #2 on
Zero-Shot Action Recognition
on Olympics
no code implementations • 19 Dec 2020 • Shreyank N Gowda, Marcus Rohrbach, Laura Sevilla-Lara
In this work, however, we focus on the more standard short, trimmed action recognition problem.
Ranked #3 on
Action Recognition
on UCF101
1 code implementation • 26 May 2020 • Shreyank N Gowda, Panagiotis Eustratiadis, Timothy Hospedales, Laura Sevilla-Lara
We treat this as a grouping problem by exploiting object proposals and making a joint inference about grouping over both space and time.
One-shot visual object segmentation
reinforcement-learning
+5
no code implementations • 23 Apr 2020 • Yannis Kalantidis, Laura Sevilla-Lara, Ernest Mwebaze, Dina Machuve, Hamed Alemohammad, David Guerena
The workshop was held in conjunction with the International Conference on Learning Representations (ICLR) 2020.
no code implementations • 19 Jul 2019 • Laura Sevilla-Lara, Shengxin Zha, Zhicheng Yan, Vedanuj Goswami, Matt Feiszli, Lorenzo Torresani
However, in current video datasets it has been observed that action classes can often be recognized without any temporal information from a single frame of video.
no code implementations • 10 Jun 2019 • Linchao Zhu, Laura Sevilla-Lara, Du Tran, Matt Feiszli, Yi Yang, Heng Wang
FASTER aims to leverage the redundancy between neighboring clips and reduce the computational cost by learning to aggregate the predictions from models of different complexities.
Ranked #24 on
Action Recognition
on UCF101
no code implementations • CVPR 2019 • Zheng Shou, Xudong Lin, Yannis Kalantidis, Laura Sevilla-Lara, Marcus Rohrbach, Shih-Fu Chang, Zhicheng Yan
Motion has shown to be useful for video understanding, where motion is typically represented by optical flow.
Ranked #1 on
Action Recognition
on UCF-101
no code implementations • 22 Dec 2017 • Laura Sevilla-Lara, Yiyi Liao, Fatma Guney, Varun Jampani, Andreas Geiger, Michael J. Black
Here we take a deeper look at the combination of flow and action recognition, and investigate why optical flow is helpful, what makes a flow method good for action recognition, and how we can make it better.
no code implementations • CVPR 2017 • Jonas Wulff, Laura Sevilla-Lara, Michael J. Black
Existing algorithms typically focus on either recovering motion and structure under the assumption of a purely static world or optical flow for general unconstrained scenes.
Ranked #10 on
Optical Flow Estimation
on Sintel-clean
no code implementations • CVPR 2016 • Laura Sevilla-Lara, Deqing Sun, Varun Jampani, Michael J. Black
Existing optical flow methods make generic, spatially homogeneous, assumptions about the spatial structure of the flow.