Search Results for author: Adam W. Harley

Found 23 papers, 12 papers with code

Evaluation of Deep Convolutional Nets for Document Image Classification and Retrieval

no code implementations25 Feb 2015 Adam W. Harley, Alex Ufkes, Konstantinos G. Derpanis

This paper presents a new state-of-the-art for document image classification and retrieval, using features learned by deep convolutional neural networks (CNNs).

Descriptive Document Image Classification +2

Learning Dense Convolutional Embeddings for Semantic Segmentation

no code implementations13 Nov 2015 Adam W. Harley, Konstantinos G. Derpanis, Iasonas Kokkinos

That is, for any two pixels on the same object, the embeddings are trained to be similar; for any pair that straddles an object boundary, the embeddings are trained to be dissimilar.

General Classification Object +1

Adversarial Inverse Graphics Networks: Learning 2D-to-3D Lifting and Image-to-Image Translation from Unpaired Supervision

no code implementations ICCV 2017 Hsiao-Yu Fish Tung, Adam W. Harley, William Seto, Katerina Fragkiadaki

Researchers have developed excellent feed-forward models that learn to map images to desired outputs, such as to the images' latent factors, or to other images, using supervised learning.

3D Human Pose Estimation Image-to-Image Translation +2

Reward Learning from Narrated Demonstrations

no code implementations CVPR 2018 Hsiao-Yu Fish Tung, Adam W. Harley, Liang-Kang Huang, Katerina Fragkiadaki

Humans effortlessly "program" one another by communicating goals and desires in natural language.

Image Disentanglement and Uncooperative Re-Entanglement for High-Fidelity Image-to-Image Translation

no code implementations11 Jan 2019 Adam W. Harley, Shih-En Wei, Jason Saragih, Katerina Fragkiadaki

Cross-domain image-to-image translation should satisfy two requirements: (1) preserve the information that is common to both domains, and (2) generate convincing images covering variations that appear in the target domain.

Disentanglement Image-to-Image Translation +1

Tracking Emerges by Looking Around Static Scenes, with Neural 3D Mapping

no code implementations ECCV 2020 Adam W. Harley, Shrinidhi K. Lakshmikanth, Paul Schydlo, Katerina Fragkiadaki

We propose to leverage multiview data of \textit{static points} in arbitrary scenes (static or dynamic), to learn a neural 3D mapping module which produces features that are correspondable across time.

3D Object Tracking Object +1

3D Object Recognition By Corresponding and Quantizing Neural 3D Scene Representations

no code implementations30 Oct 2020 Mihir Prabhudesai, Shamit Lal, Hsiao-Yu Fish Tung, Adam W. Harley, Shubhankar Potdar, Katerina Fragkiadaki

We can compare the 3D feature maps of two objects by searching alignment across scales and 3D rotations, and, as a result of the operation, we can estimate pose and scale changes without the need for 3D pose annotations.

3D Object Recognition Object +2

Move to See Better: Self-Improving Embodied Object Detection

1 code implementation30 Nov 2020 Zhaoyuan Fang, Ayush Jain, Gabriel Sarch, Adam W. Harley, Katerina Fragkiadaki

Experiments on both indoor and outdoor datasets show that (1) our method obtains high-quality 2D and 3D pseudo-labels from multi-view RGB-D data; (2) fine-tuning with these pseudo-labels improves the 2D detector significantly in the test environment; (3) training a 3D detector with our pseudo-labels outperforms a prior self-supervised method by a large margin; (4) given weak supervision, our method can generate better pseudo-labels for novel objects.

Object object-detection +1

CoCoNets: Continuous Contrastive 3D Scene Representations

1 code implementation CVPR 2021 Shamit Lal, Mihir Prabhudesai, Ishita Mediratta, Adam W. Harley, Katerina Fragkiadaki

This paper explores self-supervised learning of amodal 3D feature representations from RGB and RGB-D posed images and videos, agnostic to object and scene semantic content, and evaluates the resulting scene representations in the downstream tasks of visual correspondence, object tracking, and object detection.

3D Object Detection Contrastive Learning +4

Particle Video Revisited: Tracking Through Occlusions Using Point Trajectories

1 code implementation8 Apr 2022 Adam W. Harley, Zhaoyuan Fang, Katerina Fragkiadaki

In this paper, we revisit Sand and Teller's "particle video" approach, and study pixel tracking as a long-range motion estimation problem, where every pixel is described with a trajectory that locates it in multiple future frames.

Motion Estimation Object Tracking +1

Simple-BEV: What Really Matters for Multi-Sensor BEV Perception?

1 code implementation16 Jun 2022 Adam W. Harley, Zhaoyuan Fang, Jie Li, Rares Ambrus, Katerina Fragkiadaki

Building 3D perception systems for autonomous vehicles that do not rely on high-density LiDAR is a critical research problem because of the expense of LiDAR systems compared to cameras and other sensors.

Autonomous Vehicles Bird's-Eye View Semantic Segmentation +1

TIDEE: Tidying Up Novel Rooms using Visuo-Semantic Commonsense Priors

1 code implementation21 Jul 2022 Gabriel Sarch, Zhaoyuan Fang, Adam W. Harley, Paul Schydlo, Michael J. Tarr, Saurabh Gupta, Katerina Fragkiadaki

We introduce TIDEE, an embodied agent that tidies up a disordered scene based on learned commonsense object placement and room arrangement priors.

Object

Cross-Image Context Matters for Bongard Problems

1 code implementation7 Sep 2023 Nikhil Raghuraman, Adam W. Harley, Leonidas Guibas

Current machine learning methods struggle to solve Bongard problems, which are a type of IQ test that requires deriving an abstract "concept" from a set of positive and negative "support" images, and then classifying whether or not a new query image depicts the key concept.

Ranked #2 on Few-Shot Image Classification on Bongard-HOI (using extra training data)

Few-Shot Image Classification Few-Shot Learning

Zero-Shot Open-Vocabulary Tracking with Large Pre-Trained Models

no code implementations10 Oct 2023 Wen-Hsuan Chu, Adam W. Harley, Pavel Tokmakov, Achal Dave, Leonidas Guibas, Katerina Fragkiadaki

This begs the question: can we re-purpose these large-scale pre-trained static image models for open-vocabulary video tracking?

Object Object Tracking +5

PACE: Pose Annotations in Cluttered Environments

1 code implementation23 Dec 2023 Yang You, Kai Xiong, Zhening Yang, Zhengxiang Huang, Junwei Zhou, Ruoxi Shi, Zhou Fang, Adam W. Harley, Cewu Lu

Addressing this, we introduce PACE (Pose Annotations in Cluttered Environments), a large-scale benchmark designed to advance the development and evaluation of pose estimation methods in cluttered scenarios.

Pose Estimation Pose Tracking

Refining Pre-Trained Motion Models

1 code implementation1 Jan 2024 Xinglong Sun, Adam W. Harley, Leonidas J. Guibas

In the first stage, we use the pre-trained model to estimate motion in a video, and then select the subset of motion estimates which we can verify with cycle-consistency.

Motion Estimation

ODIN: A Single Model for 2D and 3D Perception

no code implementations4 Jan 2024 Ayush Jain, Pushkal Katara, Nikolaos Gkanatsios, Adam W. Harley, Gabriel Sarch, Kriti Aggarwal, Vishrav Chaudhary, Katerina Fragkiadaki

The gap in performance between methods that consume posed images versus post-processed 3D point clouds has fueled the belief that 2D and 3D perception require distinct model architectures.

3D Instance Segmentation Semantic Segmentation

Cannot find the paper you are looking for? You can Submit a new open access paper.