no code implementations • 5 Dec 2024 • Ahmad Darkhalil, Rhodri Guerrier, Adam W. Harley, Dima Damen
When fine-tuning point tracking methods on these sequences and evaluating on our annotated EgoPoints sequences, we improve CoTracker across all metrics, including the tracking accuracy $\delta^\star_{\text{avg}}$ by 2. 7 percentage points and accuracy on ReID sequences (ReID$\delta_{\text{avg}}$) by 2. 4 points.
no code implementations • 5 Dec 2024 • Yiqing Liang, Mikhail Okunev, Mikaela Angelina Uy, Runfeng Li, Leonidas Guibas, James Tompkin, Adam W. Harley
Gaussian splatting methods are emerging as a popular approach for converting multi-view image data into scene representations that allow view synthesis.
1 code implementation • 30 May 2024 • Haodi He, Colton Stearns, Adam W. Harley, Leonidas J. Guibas
In this work, we address the challenging task of lifting multi-granular and view-inconsistent image segmentations into a hierarchical and 3D-consistent representation.
1 code implementation • CVPR 2024 • Ayush Jain, Pushkal Katara, Nikolaos Gkanatsios, Adam W. Harley, Gabriel Sarch, Kriti Aggarwal, Vishrav Chaudhary, Katerina Fragkiadaki
The gap in performance between methods that consume posed images versus post-processed 3D point clouds has fueled the belief that 2D and 3D perception require distinct model architectures.
Ranked #1 on
3D Instance Segmentation
on ScanNet200
1 code implementation • 1 Jan 2024 • Xinglong Sun, Adam W. Harley, Leonidas J. Guibas
In the first stage, we use the pre-trained model to estimate motion in a video, and then select the subset of motion estimates which we can verify with cycle-consistency.
1 code implementation • 23 Dec 2023 • Yang You, Kai Xiong, Zhening Yang, Zhengxiang Huang, Junwei Zhou, Ruoxi Shi, Zhou Fang, Adam W. Harley, Leonidas Guibas, Cewu Lu
We introduce PACE (Pose Annotations in Cluttered Environments), a large-scale benchmark designed to advance the development and evaluation of pose estimation methods in cluttered scenarios.
no code implementations • 10 Oct 2023 • Wen-Hsuan Chu, Adam W. Harley, Pavel Tokmakov, Achal Dave, Leonidas Guibas, Katerina Fragkiadaki
This begs the question: can we re-purpose these large-scale pre-trained static image models for open-vocabulary video tracking?
1 code implementation • 7 Sep 2023 • Nikhil Raghuraman, Adam W. Harley, Leonidas Guibas
Current machine learning methods struggle to solve Bongard problems, which are a type of IQ test that requires deriving an abstract "concept" from a set of positive and negative "support" images, and then classifying whether or not a new query image depicts the key concept.
Ranked #2 on
Few-Shot Image Classification
on Bongard-HOI
(using extra training data)
3 code implementations • ICCV 2023 • Yang Zheng, Adam W. Harley, Bokui Shen, Gordon Wetzstein, Leonidas J. Guibas
Our goal is to advance the state-of-the-art by placing emphasis on long videos with naturalistic motion.
Ranked #1 on
Point Tracking
on TAP-Vid
1 code implementation • 21 Jul 2022 • Gabriel Sarch, Zhaoyuan Fang, Adam W. Harley, Paul Schydlo, Michael J. Tarr, Saurabh Gupta, Katerina Fragkiadaki
We introduce TIDEE, an embodied agent that tidies up a disordered scene based on learned commonsense object placement and room arrangement priors.
1 code implementation • 16 Jun 2022 • Adam W. Harley, Zhaoyuan Fang, Jie Li, Rares Ambrus, Katerina Fragkiadaki
Building 3D perception systems for autonomous vehicles that do not rely on high-density LiDAR is a critical research problem because of the expense of LiDAR systems compared to cameras and other sensors.
Autonomous Vehicles
Bird's-Eye View Semantic Segmentation
+1
1 code implementation • 8 Apr 2022 • Adam W. Harley, Zhaoyuan Fang, Katerina Fragkiadaki
In this paper, we revisit Sand and Teller's "particle video" approach, and study pixel tracking as a long-range motion estimation problem, where every pixel is described with a trajectory that locates it in multiple future frames.
1 code implementation • CVPR 2021 • Shamit Lal, Mihir Prabhudesai, Ishita Mediratta, Adam W. Harley, Katerina Fragkiadaki
This paper explores self-supervised learning of amodal 3D feature representations from RGB and RGB-D posed images and videos, agnostic to object and scene semantic content, and evaluates the resulting scene representations in the downstream tasks of visual correspondence, object tracking, and object detection.
no code implementations • CVPR 2021 • Adam W. Harley, Yiming Zuo, Jing Wen, Ayush Mangal, Shubhankar Potdar, Ritwick Chaudhry, Katerina Fragkiadaki
We propose an unsupervised method for detecting and tracking moving objects in 3D, in unlabelled RGB-D videos.
1 code implementation • 30 Nov 2020 • Zhaoyuan Fang, Ayush Jain, Gabriel Sarch, Adam W. Harley, Katerina Fragkiadaki
Experiments on both indoor and outdoor datasets show that (1) our method obtains high-quality 2D and 3D pseudo-labels from multi-view RGB-D data; (2) fine-tuning with these pseudo-labels improves the 2D detector significantly in the test environment; (3) training a 3D detector with our pseudo-labels outperforms a prior self-supervised method by a large margin; (4) given weak supervision, our method can generate better pseudo-labels for novel objects.
no code implementations • 30 Oct 2020 • Mihir Prabhudesai, Shamit Lal, Hsiao-Yu Fish Tung, Adam W. Harley, Shubhankar Potdar, Katerina Fragkiadaki
We can compare the 3D feature maps of two objects by searching alignment across scales and 3D rotations, and, as a result of the operation, we can estimate pose and scale changes without the need for 3D pose annotations.
no code implementations • ECCV 2020 • Adam W. Harley, Shrinidhi K. Lakshmikanth, Paul Schydlo, Katerina Fragkiadaki
We propose to leverage multiview data of \textit{static points} in arbitrary scenes (static or dynamic), to learn a neural 3D mapping module which produces features that are correspondable across time.
1 code implementation • CVPR 2020 • Mihir Prabhudesai, Hsiao-Yu Fish Tung, Syed Ashar Javed, Maximilian Sieb, Adam W. Harley, Katerina Fragkiadaki
We propose associating language utterances to 3D visual abstractions of the scene they describe.
1 code implementation • ICLR 2020 • Adam W. Harley, Shrinidhi K. Lakshmikanth, Fangyu Li, Xian Zhou, Hsiao-Yu Fish Tung, Katerina Fragkiadaki
Predictive coding theories suggest that the brain learns by predicting observations at various levels of abstraction.
no code implementations • 11 Jan 2019 • Adam W. Harley, Shih-En Wei, Jason Saragih, Katerina Fragkiadaki
Cross-domain image-to-image translation should satisfy two requirements: (1) preserve the information that is common to both domains, and (2) generate convincing images covering variations that appear in the target domain.
no code implementations • CVPR 2018 • Hsiao-Yu Fish Tung, Adam W. Harley, Liang-Kang Huang, Katerina Fragkiadaki
Humans effortlessly "program" one another by communicating goals and desires in natural language.
1 code implementation • ICCV 2017 • Adam W. Harley, Konstantinos G. Derpanis, Iasonas Kokkinos
We introduce an approach to integrate segmentation information within a convolutional neural network (CNN).
no code implementations • ICCV 2017 • Hsiao-Yu Fish Tung, Adam W. Harley, William Seto, Katerina Fragkiadaki
Researchers have developed excellent feed-forward models that learn to map images to desired outputs, such as to the images' latent factors, or to other images, using supervised learning.
no code implementations • 20 Aug 2016 • Jason J. Yu, Adam W. Harley, Konstantinos G. Derpanis
Recently, convolutional networks (convnets) have proven useful for predicting optical flow.
no code implementations • 13 Nov 2015 • Adam W. Harley, Konstantinos G. Derpanis, Iasonas Kokkinos
That is, for any two pixels on the same object, the embeddings are trained to be similar; for any pair that straddles an object boundary, the embeddings are trained to be dissimilar.
no code implementations • 25 Feb 2015 • Adam W. Harley, Alex Ufkes, Konstantinos G. Derpanis
This paper presents a new state-of-the-art for document image classification and retrieval, using features learned by deep convolutional neural networks (CNNs).