1 code implementation • CVPR 2023 • Basile Van Hoorick, Pavel Tokmakov, Simon Stent, Jie Li, Carl Vondrick
Tracking objects with persistence in cluttered and dynamic environments remains a difficult challenge for computer vision systems.
2 code implementations • CVPR 2023 • Zhipeng Bao, Pavel Tokmakov, Yu-Xiong Wang, Adrien Gaidon, Martial Hebert
Object discovery -- separating objects from the background without manual labels -- is a fundamental open challenge in computer vision.
1 code implementation • 20 Mar 2023 • Ruoshi Liu, Rundi Wu, Basile Van Hoorick, Pavel Tokmakov, Sergey Zakharov, Carl Vondrick
We introduce Zero-1-to-3, a framework for changing the camera viewpoint of an object given just a single RGB image.
1 code implementation • CVPR 2023 • Ziqi Pang, Jie Li, Pavel Tokmakov, Dian Chen, Sergey Zagoruyko, Yu-Xiong Wang
It emphasizes spatio-temporal continuity and integrates both past and future reasoning for tracked objects.
no code implementations • CVPR 2023 • Pavel Tokmakov, Jie Li, Adrien Gaidon
Yet, this important phenomenon is largely absent from existing video object segmentation (VOS) benchmarks.
1 code implementation • 4 Apr 2022 • Pavel Tokmakov, Allan Jabri, Jie Li, Adrien Gaidon
This paper proposes a self-supervised objective for learning representations that localize objects under occlusion - a property known as object permanence.
1 code implementation • CVPR 2022 • Zhipeng Bao, Pavel Tokmakov, Allan Jabri, Yu-Xiong Wang, Adrien Gaidon, Martial Hebert
Our experiments demonstrate that, despite only capturing a small subset of the objects that move, this signal is enough to generalize to segment both moving and static instances of dynamic objects.
1 code implementation • 26 Apr 2021 • Boris Ivanovic, Kuan-Hui Lee, Pavel Tokmakov, Blake Wulfe, Rowan Mcallister, Adrien Gaidon, Marco Pavone
Reasoning about the future behavior of other agents is critical to safe robot navigation.
1 code implementation • ICCV 2021 • Pavel Tokmakov, Jie Li, Wolfram Burgard, Adrien Gaidon
In this work, we introduce an end-to-end trainable approach for joint object detection and tracking that is capable of such reasoning.
1 code implementation • 28 Jun 2020 • Pavel Tokmakov, Martial Hebert, Cordelia Schmid
This paper addresses the task of unsupervised learning of representations for action recognition in videos.
no code implementations • ECCV 2020 • Achal Dave, Tarasha Khurana, Pavel Tokmakov, Cordelia Schmid, Deva Ramanan
To this end, we ask annotators to label objects that move at any point in the video, and give names to them post factum.
1 code implementation • 29 Nov 2019 • Ziqi Pang, Zhiyuan Hu, Pavel Tokmakov, Yu-Xiong Wang, Martial Hebert
Indeed, even the majority of few-shot learning methods rely on a large set of "base classes" for pretraining.
no code implementations • 25 Oct 2019 • Achal Dave, Pavel Tokmakov, Cordelia Schmid, Deva Ramanan
Moreover, at test time the same network can be applied to detection and tracking, resulting in a unified approach for the two tasks.
no code implementations • 29 Apr 2019 • Yubo Zhang, Pavel Tokmakov, Martial Hebert, Cordelia Schmid
In this work we study the problem of action detection in a highly-imbalanced dataset.
1 code implementation • 11 Feb 2019 • Achal Dave, Pavel Tokmakov, Deva Ramanan
To address this concern, we propose two new benchmarks for generic, moving object detection, and show that our model matches top-down methods on common categories, while significantly out-performing both top-down and bottom-up methods on never-before-seen categories.
no code implementations • ICCV 2019 • Pavel Tokmakov, Yu-Xiong Wang, Martial Hebert
One of the key limitations of modern deep learning approaches lies in the amount of data required to train them.
no code implementations • CVPR 2019 • Yubo Zhang, Pavel Tokmakov, Martial Hebert, Cordelia Schmid
A dominant paradigm for learning-based approaches in computer vision is training generic models, such as ResNet for image recognition, or I3D for video understanding, on large datasets and allowing them to discover the optimal representation for the problem at hand.
no code implementations • 1 Dec 2017 • Pavel Tokmakov, Cordelia Schmid, Karteek Alahari
We formulate this as a learning problem and design our framework with three cues: (i) independent object motion between a pair of frames, which complements object recognition, (ii) object appearance, which helps to correct errors in motion estimation, and (iii) temporal consistency, which imposes additional constraints on the segmentation.
no code implementations • ICCV 2017 • Pavel Tokmakov, Karteek Alahari, Cordelia Schmid
The module to build a "visual memory" in video, i. e., a joint representation of all the video frames, is realized with a convolutional recurrent unit learned from a small number of training video sequences.
no code implementations • CVPR 2017 • Pavel Tokmakov, Karteek Alahari, Cordelia Schmid
The problem of determining whether an object is in motion, irrespective of camera motion, is far from being solved.
no code implementations • 23 Mar 2016 • Pavel Tokmakov, Karteek Alahari, Cordelia Schmid
We also demonstrate that the performance of M-CNN learned with 150 weak video annotations is on par with state-of-the-art weakly-supervised methods trained with thousands of images.
Image Segmentation
Weakly supervised Semantic Segmentation
+1
no code implementations • 12 Oct 2014 • Kristian Kersting, Martin Mladenov, Pavel Tokmakov
A relational linear program (RLP) is a declarative LP template defining the objective and the constraints through the logical concepts of objects, relations, and quantified variables.