Search Results for author: Tomáš Souček

Found 7 papers, 6 papers with code

6D Object Pose Tracking in Internet Videos for Robotic Manipulation

no code implementations13 Mar 2025 Georgy Ponimatkin, Martin Cífka, Tomáš Souček, Médéric Fourmy, Yann Labbé, Vladimir Petrik, Josef Sivic

Third, we thoroughly evaluate and ablate our 6D pose estimation method on YCB-V and HOPE-Video datasets as well as a new dataset of instructional videos manually annotated with approximate 6D object trajectories.

 Ranked #1 on 6D Pose Estimation on DTTD-Mobile (AR CoU metric)

6D Pose Estimation Object +1

ShowHowTo: Generating Scene-Conditioned Step-by-Step Visual Instructions

1 code implementation CVPR 2025 Tomáš Souček, Prajwal Gatti, Michael Wray, Ivan Laptev, Dima Damen, Josef Sivic

The goal of this work is to generate step-by-step visual instructions in the form of a sequence of images, given an input image that provides the scene context and the sequence of textual instructions.

GenHowTo: Learning to Generate Actions and State Transformations from Instructional Videos

1 code implementation CVPR 2024 Tomáš Souček, Dima Damen, Michael Wray, Ivan Laptev, Josef Sivic

We address the task of generating temporally consistent and physically plausible images of actions and object state transformations.

Object

Multi-Task Learning of Object State Changes from Uncurated Videos

1 code implementation24 Nov 2022 Tomáš Souček, Jean-Baptiste Alayrac, Antoine Miech, Ivan Laptev, Josef Sivic

We aim to learn to temporally localize object state changes and the corresponding state-modifying actions by observing people interacting with objects in long uncurated web videos.

Multi-Task Learning Object +2

Look for the Change: Learning Object States and State-Modifying Actions from Untrimmed Web Videos

1 code implementation CVPR 2022 Tomáš Souček, Jean-Baptiste Alayrac, Antoine Miech, Ivan Laptev, Josef Sivic

In this paper, we seek to temporally localize object states (e. g. "empty" and "full" cup) together with the corresponding state-modifying actions ("pouring coffee") in long uncurated videos with minimal supervision.

Object

TransNet V2: An effective deep network architecture for fast shot transition detection

4 code implementations11 Aug 2020 Tomáš Souček, Jakub Lokoč

Although automatic shot transition detection approaches are already investigated for more than two decades, an effective universal human-level model was not proposed yet.

Ranked #2 on Camera shot boundary detection on ClipShots (using extra training data)

Boundary Detection Camera shot boundary detection +1

Cannot find the paper you are looking for? You can Submit a new open access paper.