Search Results for author: Derek Hoiem

Found 47 papers, 19 papers with code

MonoPatchNeRF: Improving Neural Radiance Fields with Patch-based Monocular Guidance

no code implementations12 Apr 2024 Yuqun Wu, Jae Yong Lee, Chuhang Zou, Shenlong Wang, Derek Hoiem

Our experiments show 4x the performance of RegNeRF and 8x that of FreeNeRF on average F1@2cm for ETH3D MVS benchmark, suggesting a fruitful research direction to improve the geometric accuracy of NeRF-based models, and sheds light on a potential future approach to enable NeRF-based optimization to eventually outperform traditional MVS.

Novel View Synthesis SSIM

ViStruct: Visual Structural Knowledge Extraction via Curriculum Guided Code-Vision Representation

1 code implementation22 Nov 2023 Yangyi Chen, Xingyao Wang, Manling Li, Derek Hoiem, Heng Ji

We adopt a weakly-supervised approach to directly generate visual event structures from captions for ViStruct training, capitalizing on abundant image-caption pairs from the web.

WebWISE: Web Interface Control and Sequential Exploration with Large Language Models

no code implementations24 Oct 2023 Heyi Tao, Sethuraman T V, Michal Shlapentokh-Rothman, Derek Hoiem

The paper investigates using a Large Language Model (LLM) to automatically perform web software tasks using click, scroll, and text input operations.

Imitation Learning In-Context Learning +3

Consistent Multimodal Generation via A Unified GAN Framework

no code implementations4 Jul 2023 Zhen Zhu, Yijun Li, Weijie Lyu, Krishna Kumar Singh, Zhixin Shu, Soeren Pirk, Derek Hoiem

We investigate how to generate multimodal image outputs, such as RGB, depth, and surface normals, with a single generative model.

multimodal generation

Continual Learning in Open-vocabulary Classification with Complementary Memory Systems

no code implementations4 Jul 2023 Zhen Zhu, Weijie Lyu, Yao Xiao, Derek Hoiem

We introduce a method for flexible and efficient continual learning in open-vocabulary image classification, drawing inspiration from the complementary learning systems observed in human cognition.

Continual Learning Image Classification

Make It So: Steering StyleGAN for Any Image Inversion and Editing

no code implementations27 Apr 2023 Anand Bhattad, Viraj Shah, Derek Hoiem, D. A. Forsyth

StyleGAN's disentangled style representation enables powerful image editing by manipulating the latent variables, but accurately mapping real-world images to their latent variables (GAN inversion) remains a challenge.

QFF: Quantized Fourier Features for Neural Field Representations

no code implementations2 Dec 2022 Jae Yong Lee, Yuqun Wu, Chuhang Zou, Shenlong Wang, Derek Hoiem

Instead, we propose to encode features in bins of Fourier features that are commonly used for positional encoding.

Sparse SPN: Depth Completion from Sparse Keypoints

no code implementations2 Dec 2022 Yuqun Wu, Jae Yong Lee, Derek Hoiem

Our long term goal is to use image-based depth completion to quickly create 3D models from sparse point clouds, e. g. from SfM or SLAM.

Depth Completion

Deep PatchMatch MVS with Learned Patch Coplanarity, Geometric Consistency and Adaptive Pixel Sampling

no code implementations14 Oct 2022 Jae Yong Lee, Chuhang Zou, Derek Hoiem

Recent work in multi-view stereo (MVS) combines learnable photometric scores and regularization with PatchMatch-based optimization to achieve robust pixelwise estimates of depth, normals, and visibility.

Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners

1 code implementation22 May 2022 Zhenhailong Wang, Manling Li, Ruochen Xu, Luowei Zhou, Jie Lei, Xudong Lin, Shuohang Wang, ZiYi Yang, Chenguang Zhu, Derek Hoiem, Shih-Fu Chang, Mohit Bansal, Heng Ji

The goal of this work is to build flexible video-language models that can generalize to various video-to-text tasks from few examples, such as domain-specific captioning, question answering, and future event prediction.

Attribute Automatic Speech Recognition +6

GRIT: General Robust Image Task Benchmark

1 code implementation28 Apr 2022 Tanmay Gupta, Ryan Marten, Aniruddha Kembhavi, Derek Hoiem

Computer vision models excel at making predictions when the test distribution closely resembles the training distribution.

Instance Segmentation Keypoint Detection +7

Webly Supervised Concept Expansion for General Purpose Vision Models

no code implementations4 Feb 2022 Amita Kamath, Christopher Clark, Tanmay Gupta, Eric Kolve, Derek Hoiem, Aniruddha Kembhavi

This work presents an effective and inexpensive alternative: learn skills from supervised datasets, learn concepts from web image search, and leverage a key characteristic of GPVs: the ability to transfer visual knowledge across skills.

Human-Object Interaction Detection Image Retrieval +4

Towards General Purpose Vision Systems: An End-to-End Task-Agnostic Vision-Language Architecture

no code implementations CVPR 2022 Tanmay Gupta, Amita Kamath, Aniruddha Kembhavi, Derek Hoiem

To reduce the time and expertise required to develop new applications, we would like to create general purpose vision systems that can learn and perform a range of tasks without any modification to the architecture or learning process.

Question Answering Visual Question Answering

PatchMatch-RL: Deep MVS with Pixelwise Depth, Normal, and Visibility

1 code implementation ICCV 2021 Jae Yong Lee, Joseph DeGol, Chuhang Zou, Derek Hoiem

To overcome the challenge of the non-differentiable PatchMatch optimization that involves iterative sampling and hard decisions, we use reinforcement learning to minimize expected photometric cost and maximize likelihood of ground truth depth and normals.

Towards General Purpose Vision Systems

2 code implementations1 Apr 2021 Tanmay Gupta, Amita Kamath, Aniruddha Kembhavi, Derek Hoiem

To reduce the time and expertise required to develop new applications, we would like to create general purpose vision systems that can learn and perform a range of tasks without any modification to the architecture or learning process.

Question Answering Visual Question Answering

Learning Curves for Analysis of Deep Networks

1 code implementation21 Oct 2020 Derek Hoiem, Tanmay Gupta, Zhizhong Li, Michal M. Shlapentokh-Rothman

Learning curves model a classifier's test error as a function of the number of training samples.

Data Augmentation Image Classification

Contrastive Learning for Weakly Supervised Phrase Grounding

1 code implementation ECCV 2020 Tanmay Gupta, Arash Vahdat, Gal Chechik, Xiaodong Yang, Jan Kautz, Derek Hoiem

Given pairs of images and captions, we maximize compatibility of the attention-weighted regions and the words in the corresponding caption, compared to non-corresponding pairs of images and captions.

Contrastive Learning Language Modelling +1

Boundary Cues for 3D Object Shape Recovery

no code implementations CVPR 2013 Kevin Karsch, Zicheng Liao, Jason Rock, Jonathan T. Barron, Derek Hoiem

Early work in computer vision considered a host of geometric cues for both shape reconstruction and recognition.


ViCo: Word Embeddings from Visual Co-occurrences

1 code implementation ICCV 2019 Tanmay Gupta, Alexander Schwing, Derek Hoiem

Through unsupervised clustering, supervised partitioning, and a zero-shot-like generalization analysis we show that our word embeddings complement text-only embeddings like GloVe by better representing similarities and differences between visual concepts that are difficult to obtain from text corpora alone.

Attribute Clustering +1

Task-Assisted Domain Adaptation with Anchor Tasks

no code implementations16 Aug 2019 Zhizhong Li, Linjie Luo, Sergey Tulyakov, Qieyun Dai, Derek Hoiem

Our key idea to improve domain adaptation is to introduce a separate anchor task (such as facial landmarks) whose annotations can be obtained at no cost or are already available on both synthetic and real datasets.

Depth Estimation Domain Adaptation +2

Silhouette Guided Point Cloud Reconstruction beyond Occlusion

1 code implementation29 Jul 2019 Chuhang Zou, Derek Hoiem

One major challenge in 3D reconstruction is to infer the complete shape geometry from partial foreground occlusions.

Point cloud reconstruction

Reducing Overconfident Errors outside the Known Distribution

no code implementations ICLR 2019 Zhizhong Li, Derek Hoiem

We compare a number of methods from related fields such as calibration and epistemic uncertainty modeling, as well as two proposed methods that reduce overconfident errors of samples from an unknown novel distribution without drastically increasing evaluation time: (1) G-distillation, training an ensemble of classifiers and then distill into a single model using both labeled and unlabeled examples, or (2) NCR, reducing prediction confidence based on its novelty detection score.

Domain Adaptation Novelty Detection

No-Frills Human-Object Interaction Detection: Factorization, Layout Encodings, and Training Techniques

3 code implementations ICCV 2019 Tanmay Gupta, Alexander Schwing, Derek Hoiem

We show that for human-object interaction detection a relatively simple factorized model with appearance and layout encodings constructed from pre-trained object detectors outperforms more sophisticated approaches.

Human-Object Interaction Detection Object

Improved Structure from Motion Using Fiducial Marker Matching

no code implementations ECCV 2018 Joseph DeGol, Timothy Bretl, Derek Hoiem

In this paper, we present an incremental structure from motion (SfM) algorithm that significantly outperforms existing algorithms when fiducial markers are present in the scene, and that matches the performance of existing algorithms when no markers are present.

Pixels, voxels, and views: A study of shape representations for single view 3D object shape prediction

no code implementations CVPR 2018 Daeyun Shin, Charless C. Fowlkes, Derek Hoiem

The goal of this paper is to compare surface-based and volumetric 3D object shape representations, as well as viewer-centered and object-centered reference frames for single-view 3D shape prediction.


Imagine This! Scripts to Compositions to Videos

5 code implementations ECCV 2018 Tanmay Gupta, Dustin Schwenk, Ali Farhadi, Derek Hoiem, Aniruddha Kembhavi

Imagining a scene described in natural language with realistic layout and appearance of entities is the ultimate test of spatial, visual, and semantic world knowledge.

Retrieval World Knowledge

Improving Confidence Estimates for Unfamiliar Examples

1 code implementation CVPR 2020 Zhizhong Li, Derek Hoiem

In this paper, we compare and evaluate several methods to improve confidence estimates for unfamiliar and familiar samples.

Attribute Domain Adaptation

LayoutNet: Reconstructing the 3D Room Layout from a Single RGB Image

2 code implementations CVPR 2018 Chuhang Zou, Alex Colburn, Qi Shan, Derek Hoiem

We propose an algorithm to predict room layout from a single image that generalizes across panoramas and perspective images, cuboid layouts and more general layouts (e. g. L-shape room).

3D Room Layouts From A Single RGB Panorama Translation

3D-PRNN: Generating Shape Primitives with Recurrent Neural Networks

2 code implementations ICCV 2017 Chuhang Zou, Ersin Yumer, Jimei Yang, Duygu Ceylan, Derek Hoiem

The success of various applications including robotics, digital content creation, and visualization demand a structured and abstract representation of the 3D world from limited sensor data.


Aligned Image-Word Representations Improve Inductive Transfer Across Vision-Language Tasks

no code implementations ICCV 2017 Tanmay Gupta, Kevin Shih, Saurabh Singh, Derek Hoiem

In this paper, we investigate a vision-language embedding as a core representation and show that it leads to better cross-task transfer than standard multi-task learning.

Multi-Task Learning Question Answering +1

Learning without Forgetting

10 code implementations29 Jun 2016 Zhizhong Li, Derek Hoiem

We propose our Learning without Forgetting method, which uses only new task data to train the network while preserving the original capabilities.

Class Incremental Learning Disjoint 10-1 +9

3DFS: Deformable Dense Depth Fusion and Segmentation for Object Reconstruction from a Handheld Camera

no code implementations15 Jun 2016 Tanmay Gupta, Daeyun Shin, Naren Sivagnanadasan, Derek Hoiem

The resulting depth maps are then fused using a proposed implicit surface function that is robust to estimation error, producing a smooth surface reconstruction of the entire scene.

3D Reconstruction Depth Estimation +4

Learning to Localize Little Landmarks

no code implementations CVPR 2016 Saurabh Singh, Derek Hoiem, David Forsyth

We describe a method to find such landmarks by finding a sequence of latent landmarks, each with a prediction model.

Swapout: Learning an ensemble of deep architectures

no code implementations NeurIPS 2016 Saurabh Singh, Derek Hoiem, David Forsyth

When viewed as a regularization method swapout not only inhibits co-adaptation of units in a layer, similar to dropout, but also across network layers.

Where To Look: Focus Regions for Visual Question Answering

no code implementations CVPR 2016 Kevin J. Shih, Saurabh Singh, Derek Hoiem

We present a method that learns to answer visual questions by selecting image regions relevant to the text-based query.

Question Answering Visual Question Answering

Part Localization using Multi-Proposal Consensus for Fine-Grained Categorization

no code implementations22 Jul 2015 Kevin J. Shih, Arun Mallya, Saurabh Singh, Derek Hoiem

We present a simple deep learning framework to simultaneously predict keypoint locations and their respective visibilities and use those to achieve state-of-the-art performance for fine-grained classification.

General Classification

Learning a Sequential Search for Landmarks

no code implementations CVPR 2015 Saurabh Singh, Derek Hoiem, David Forsyth

We propose a general method to find landmarks in images of objects using both appearance and spatial context.

Predicting Complete 3D Models of Indoor Scenes

1 code implementation9 Apr 2015 Ruiqi Guo, Chuhang Zou, Derek Hoiem

One major goal of vision is to infer physical models of objects, surfaces, and their layout from sensors.

Visual Reasoning

Learning Collections of Part Models for Object Recognition

no code implementations CVPR 2013 Ian Endres, Kevin J. Shih, Johnston Jiaa, Derek Hoiem

We propose a method to learn a diverse collection of discriminative parts from object bounding box annotations.

Object Object Recognition

Cannot find the paper you are looking for? You can Submit a new open access paper.