no code implementations • 25 Feb 2015 • Adam W. Harley, Alex Ufkes, Konstantinos G. Derpanis
This paper presents a new state-of-the-art for document image classification and retrieval, using features learned by deep convolutional neural networks (CNNs).
no code implementations • 13 Nov 2015 • Adam W. Harley, Konstantinos G. Derpanis, Iasonas Kokkinos
That is, for any two pixels on the same object, the embeddings are trained to be similar; for any pair that straddles an object boundary, the embeddings are trained to be dissimilar.
no code implementations • 20 Aug 2016 • Jason J. Yu, Adam W. Harley, Konstantinos G. Derpanis
Recently, convolutional networks (convnets) have proven useful for predicting optical flow.
no code implementations • ICCV 2017 • Hsiao-Yu Fish Tung, Adam W. Harley, William Seto, Katerina Fragkiadaki
Researchers have developed excellent feed-forward models that learn to map images to desired outputs, such as to the images' latent factors, or to other images, using supervised learning.
1 code implementation • ICCV 2017 • Adam W. Harley, Konstantinos G. Derpanis, Iasonas Kokkinos
We introduce an approach to integrate segmentation information within a convolutional neural network (CNN).
no code implementations • CVPR 2018 • Hsiao-Yu Fish Tung, Adam W. Harley, Liang-Kang Huang, Katerina Fragkiadaki
Humans effortlessly "program" one another by communicating goals and desires in natural language.
no code implementations • 11 Jan 2019 • Adam W. Harley, Shih-En Wei, Jason Saragih, Katerina Fragkiadaki
Cross-domain image-to-image translation should satisfy two requirements: (1) preserve the information that is common to both domains, and (2) generate convincing images covering variations that appear in the target domain.
1 code implementation • ICLR 2020 • Adam W. Harley, Shrinidhi K. Lakshmikanth, Fangyu Li, Xian Zhou, Hsiao-Yu Fish Tung, Katerina Fragkiadaki
Predictive coding theories suggest that the brain learns by predicting observations at various levels of abstraction.
1 code implementation • CVPR 2020 • Mihir Prabhudesai, Hsiao-Yu Fish Tung, Syed Ashar Javed, Maximilian Sieb, Adam W. Harley, Katerina Fragkiadaki
We propose associating language utterances to 3D visual abstractions of the scene they describe.
no code implementations • ECCV 2020 • Adam W. Harley, Shrinidhi K. Lakshmikanth, Paul Schydlo, Katerina Fragkiadaki
We propose to leverage multiview data of \textit{static points} in arbitrary scenes (static or dynamic), to learn a neural 3D mapping module which produces features that are correspondable across time.
no code implementations • 30 Oct 2020 • Mihir Prabhudesai, Shamit Lal, Hsiao-Yu Fish Tung, Adam W. Harley, Shubhankar Potdar, Katerina Fragkiadaki
We can compare the 3D feature maps of two objects by searching alignment across scales and 3D rotations, and, as a result of the operation, we can estimate pose and scale changes without the need for 3D pose annotations.
1 code implementation • 30 Nov 2020 • Zhaoyuan Fang, Ayush Jain, Gabriel Sarch, Adam W. Harley, Katerina Fragkiadaki
Experiments on both indoor and outdoor datasets show that (1) our method obtains high-quality 2D and 3D pseudo-labels from multi-view RGB-D data; (2) fine-tuning with these pseudo-labels improves the 2D detector significantly in the test environment; (3) training a 3D detector with our pseudo-labels outperforms a prior self-supervised method by a large margin; (4) given weak supervision, our method can generate better pseudo-labels for novel objects.
no code implementations • CVPR 2021 • Adam W. Harley, Yiming Zuo, Jing Wen, Ayush Mangal, Shubhankar Potdar, Ritwick Chaudhry, Katerina Fragkiadaki
We propose an unsupervised method for detecting and tracking moving objects in 3D, in unlabelled RGB-D videos.
1 code implementation • CVPR 2021 • Shamit Lal, Mihir Prabhudesai, Ishita Mediratta, Adam W. Harley, Katerina Fragkiadaki
This paper explores self-supervised learning of amodal 3D feature representations from RGB and RGB-D posed images and videos, agnostic to object and scene semantic content, and evaluates the resulting scene representations in the downstream tasks of visual correspondence, object tracking, and object detection.
1 code implementation • 8 Apr 2022 • Adam W. Harley, Zhaoyuan Fang, Katerina Fragkiadaki
In this paper, we revisit Sand and Teller's "particle video" approach, and study pixel tracking as a long-range motion estimation problem, where every pixel is described with a trajectory that locates it in multiple future frames.
1 code implementation • 16 Jun 2022 • Adam W. Harley, Zhaoyuan Fang, Jie Li, Rares Ambrus, Katerina Fragkiadaki
Building 3D perception systems for autonomous vehicles that do not rely on high-density LiDAR is a critical research problem because of the expense of LiDAR systems compared to cameras and other sensors.
Autonomous Vehicles Bird's-Eye View Semantic Segmentation +1
1 code implementation • 21 Jul 2022 • Gabriel Sarch, Zhaoyuan Fang, Adam W. Harley, Paul Schydlo, Michael J. Tarr, Saurabh Gupta, Katerina Fragkiadaki
We introduce TIDEE, an embodied agent that tidies up a disordered scene based on learned commonsense object placement and room arrangement priors.
2 code implementations • ICCV 2023 • Yang Zheng, Adam W. Harley, Bokui Shen, Gordon Wetzstein, Leonidas J. Guibas
Our goal is to advance the state-of-the-art by placing emphasis on long videos with naturalistic motion.
Ranked #1 on Point Tracking on TAP-Vid
1 code implementation • 7 Sep 2023 • Nikhil Raghuraman, Adam W. Harley, Leonidas Guibas
Current machine learning methods struggle to solve Bongard problems, which are a type of IQ test that requires deriving an abstract "concept" from a set of positive and negative "support" images, and then classifying whether or not a new query image depicts the key concept.
Ranked #2 on Few-Shot Image Classification on Bongard-HOI (using extra training data)
no code implementations • 10 Oct 2023 • Wen-Hsuan Chu, Adam W. Harley, Pavel Tokmakov, Achal Dave, Leonidas Guibas, Katerina Fragkiadaki
This begs the question: can we re-purpose these large-scale pre-trained static image models for open-vocabulary video tracking?
1 code implementation • 23 Dec 2023 • Yang You, Kai Xiong, Zhening Yang, Zhengxiang Huang, Junwei Zhou, Ruoxi Shi, Zhou Fang, Adam W. Harley, Cewu Lu
Addressing this, we introduce PACE (Pose Annotations in Cluttered Environments), a large-scale benchmark designed to advance the development and evaluation of pose estimation methods in cluttered scenarios.
1 code implementation • 1 Jan 2024 • Xinglong Sun, Adam W. Harley, Leonidas J. Guibas
In the first stage, we use the pre-trained model to estimate motion in a video, and then select the subset of motion estimates which we can verify with cycle-consistency.
no code implementations • 4 Jan 2024 • Ayush Jain, Pushkal Katara, Nikolaos Gkanatsios, Adam W. Harley, Gabriel Sarch, Kriti Aggarwal, Vishrav Chaudhary, Katerina Fragkiadaki
The gap in performance between methods that consume posed images versus post-processed 3D point clouds has fueled the belief that 2D and 3D perception require distinct model architectures.