no code implementations • 30 Oct 2024 • Anurag Bagchi, Zhipeng Bao, Yu-Xiong Wang, Pavel Tokmakov, Martial Hebert
We present REM, a framework for segmenting a wide range of concepts in video that can be described through natural language.
no code implementations • 15 Sep 2024 • Vitor Guizilini, Pavel Tokmakov, Achal Dave, Rares Ambrus
3D reconstruction from a single image is a long-standing problem in computer vision.
Ranked #5 on Monocular Depth Estimation on NYU-Depth V2 (using extra training data)
no code implementations • 24 Jun 2024 • Junbang Liang, Ruoshi Liu, Ege Ozguroglu, Sruthi Sudhakar, Achal Dave, Pavel Tokmakov, Shuran Song, Carl Vondrick
A key challenge in manipulation is learning a policy that can robustly generalize to diverse visual environments.
no code implementations • 23 May 2024 • Basile Van Hoorick, Rundi Wu, Ege Ozguroglu, Kyle Sargent, Ruoshi Liu, Pavel Tokmakov, Achal Dave, Changxi Zheng, Carl Vondrick
Accurate reconstruction of complex dynamic scenes from just a single viewpoint continues to be a challenging task in computer vision.
1 code implementation • CVPR 2024 • Ege Ozguroglu, Ruoshi Liu, Dídac Surís, Dian Chen, Achal Dave, Pavel Tokmakov, Carl Vondrick
We introduce pix2gestalt, a framework for zero-shot amodal segmentation, which learns to estimate the shape and appearance of whole objects that are only partially visible behind occlusions.
no code implementations • CVPR 2024 • Matthew Kowal, Achal Dave, Rares Ambrus, Adrien Gaidon, Konstantinos G. Derpanis, Pavel Tokmakov
Concretely, we seek to explain the decision-making process of video transformers based on high-level, spatiotemporal concepts that are automatically discovered.
no code implementations • 10 Oct 2023 • Wen-Hsuan Chu, Adam W. Harley, Pavel Tokmakov, Achal Dave, Leonidas Guibas, Katerina Fragkiadaki
This begs the question: can we re-purpose these large-scale pre-trained static image models for open-vocabulary video tracking?
1 code implementation • CVPR 2023 • Basile Van Hoorick, Pavel Tokmakov, Simon Stent, Jie Li, Carl Vondrick
Tracking objects with persistence in cluttered and dynamic environments remains a difficult challenge for computer vision systems.
2 code implementations • CVPR 2023 • Zhipeng Bao, Pavel Tokmakov, Yu-Xiong Wang, Adrien Gaidon, Martial Hebert
Object discovery -- separating objects from the background without manual labels -- is a fundamental open challenge in computer vision.
1 code implementation • ICCV 2023 • Ruoshi Liu, Rundi Wu, Basile Van Hoorick, Pavel Tokmakov, Sergey Zakharov, Carl Vondrick
We introduce Zero-1-to-3, a framework for changing the camera viewpoint of an object given just a single RGB image.
1 code implementation • CVPR 2023 • Ziqi Pang, Jie Li, Pavel Tokmakov, Dian Chen, Sergey Zagoruyko, Yu-Xiong Wang
It emphasizes spatio-temporal continuity and integrates both past and future reasoning for tracked objects.
no code implementations • CVPR 2023 • Pavel Tokmakov, Jie Li, Adrien Gaidon
Yet, this important phenomenon is largely absent from existing video object segmentation (VOS) benchmarks.
1 code implementation • 4 Apr 2022 • Pavel Tokmakov, Allan Jabri, Jie Li, Adrien Gaidon
This paper proposes a self-supervised objective for learning representations that localize objects under occlusion - a property known as object permanence.
1 code implementation • CVPR 2022 • Zhipeng Bao, Pavel Tokmakov, Allan Jabri, Yu-Xiong Wang, Adrien Gaidon, Martial Hebert
Our experiments demonstrate that, despite only capturing a small subset of the objects that move, this signal is enough to generalize to segment both moving and static instances of dynamic objects.
1 code implementation • 26 Apr 2021 • Boris Ivanovic, Kuan-Hui Lee, Pavel Tokmakov, Blake Wulfe, Rowan Mcallister, Adrien Gaidon, Marco Pavone
Reasoning about the future behavior of other agents is critical to safe robot navigation.
1 code implementation • ICCV 2021 • Pavel Tokmakov, Jie Li, Wolfram Burgard, Adrien Gaidon
In this work, we introduce an end-to-end trainable approach for joint object detection and tracking that is capable of such reasoning.
1 code implementation • 28 Jun 2020 • Pavel Tokmakov, Martial Hebert, Cordelia Schmid
This paper addresses the task of unsupervised learning of representations for action recognition in videos.
no code implementations • ECCV 2020 • Achal Dave, Tarasha Khurana, Pavel Tokmakov, Cordelia Schmid, Deva Ramanan
To this end, we ask annotators to label objects that move at any point in the video, and give names to them post factum.
1 code implementation • 29 Nov 2019 • Ziqi Pang, Zhiyuan Hu, Pavel Tokmakov, Yu-Xiong Wang, Martial Hebert
Indeed, even the majority of few-shot learning methods rely on a large set of "base classes" for pretraining.
no code implementations • 25 Oct 2019 • Achal Dave, Pavel Tokmakov, Cordelia Schmid, Deva Ramanan
Moreover, at test time the same network can be applied to detection and tracking, resulting in a unified approach for the two tasks.
no code implementations • 29 Apr 2019 • Yubo Zhang, Pavel Tokmakov, Martial Hebert, Cordelia Schmid
In this work we study the problem of action detection in a highly-imbalanced dataset.
1 code implementation • 11 Feb 2019 • Achal Dave, Pavel Tokmakov, Deva Ramanan
To address this concern, we propose two new benchmarks for generic, moving object detection, and show that our model matches top-down methods on common categories, while significantly out-performing both top-down and bottom-up methods on never-before-seen categories.
no code implementations • ICCV 2019 • Pavel Tokmakov, Yu-Xiong Wang, Martial Hebert
One of the key limitations of modern deep learning approaches lies in the amount of data required to train them.
no code implementations • CVPR 2019 • Yubo Zhang, Pavel Tokmakov, Martial Hebert, Cordelia Schmid
A dominant paradigm for learning-based approaches in computer vision is training generic models, such as ResNet for image recognition, or I3D for video understanding, on large datasets and allowing them to discover the optimal representation for the problem at hand.
no code implementations • 1 Dec 2017 • Pavel Tokmakov, Cordelia Schmid, Karteek Alahari
We formulate this as a learning problem and design our framework with three cues: (i) independent object motion between a pair of frames, which complements object recognition, (ii) object appearance, which helps to correct errors in motion estimation, and (iii) temporal consistency, which imposes additional constraints on the segmentation.
no code implementations • ICCV 2017 • Pavel Tokmakov, Karteek Alahari, Cordelia Schmid
The module to build a "visual memory" in video, i. e., a joint representation of all the video frames, is realized with a convolutional recurrent unit learned from a small number of training video sequences.
no code implementations • CVPR 2017 • Pavel Tokmakov, Karteek Alahari, Cordelia Schmid
The problem of determining whether an object is in motion, irrespective of camera motion, is far from being solved.
no code implementations • 23 Mar 2016 • Pavel Tokmakov, Karteek Alahari, Cordelia Schmid
We also demonstrate that the performance of M-CNN learned with 150 weak video annotations is on par with state-of-the-art weakly-supervised methods trained with thousands of images.
Image Segmentation Weakly supervised Semantic Segmentation +1
no code implementations • 12 Oct 2014 • Kristian Kersting, Martin Mladenov, Pavel Tokmakov
A relational linear program (RLP) is a declarative LP template defining the objective and the constraints through the logical concepts of objects, relations, and quantified variables.