no code implementations • 13 Mar 2025 • Georgy Ponimatkin, Martin Cífka, Tomáš Souček, Médéric Fourmy, Yann Labbé, Vladimir Petrik, Josef Sivic
Third, we thoroughly evaluate and ablate our 6D pose estimation method on YCB-V and HOPE-Video datasets as well as a new dataset of instructional videos manually annotated with approximate 6D object trajectories.
Ranked #1 on
6D Pose Estimation
on DTTD-Mobile
(AR CoU metric)
1 code implementation • CVPR 2025 • Tomáš Souček, Prajwal Gatti, Michael Wray, Ivan Laptev, Dima Damen, Josef Sivic
The goal of this work is to generate step-by-step visual instructions in the form of a sequence of images, given an input image that provides the scene context and the sequence of textual instructions.
1 code implementation • CVPR 2024 • Tomáš Souček, Dima Damen, Michael Wray, Ivan Laptev, Josef Sivic
We address the task of generating temporally consistent and physically plausible images of actions and object state transformations.
1 code implementation • 24 Nov 2022 • Tomáš Souček, Jean-Baptiste Alayrac, Antoine Miech, Ivan Laptev, Josef Sivic
We aim to learn to temporally localize object state changes and the corresponding state-modifying actions by observing people interacting with objects in long uncurated web videos.
1 code implementation • CVPR 2022 • Tomáš Souček, Jean-Baptiste Alayrac, Antoine Miech, Ivan Laptev, Josef Sivic
In this paper, we seek to temporally localize object states (e. g. "empty" and "full" cup) together with the corresponding state-modifying actions ("pouring coffee") in long uncurated videos with minimal supervision.
4 code implementations • 11 Aug 2020 • Tomáš Souček, Jakub Lokoč
Although automatic shot transition detection approaches are already investigated for more than two decades, an effective universal human-level model was not proposed yet.
Ranked #2 on
Camera shot boundary detection
on ClipShots
(using extra training data)
4 code implementations • 8 Jun 2019 • Tomáš Souček, Jaroslav Moravec, Jakub Lokoč
Shot boundary detection (SBD) is an important first step in many video processing applications.