no code implementations • 11 Dec 2024 • Michal Shlapentokh-Rothman, Yu-Xiong Wang, Derek Hoiem
Visual programming prompts LLMs (large language mod-els) to generate executable code for visual tasks like visual question answering (VQA).
no code implementations • 2 Dec 2024 • Savya Khosla, Sethuraman T V, Alexander Schwing, Derek Hoiem
We present RELOCATE, a simple training-free baseline designed to perform the challenging task of visual query localization in long videos.
no code implementations • 24 Sep 2024 • Jae Yong Lee, Yuqun Wu, Chuhang Zou, Derek Hoiem, Shenlong Wang
The goal of this paper is to encode a 3D scene into an extremely compact representation from 2D images and to enable its transmittance, decoding and rendering in real-time across various platforms.
1 code implementation • 13 Sep 2024 • Zhen Zhu, Yiming Gong, Derek Hoiem
We propose a dynamic weighting between predictions of a partially fine-tuned model and a fixed open vocabulary model that enables continual improvement when training samples are available for a subset of a task's labels.
no code implementations • 12 Apr 2024 • Yuqun Wu, Jae Yong Lee, Chuhang Zou, Shenlong Wang, Derek Hoiem
The latest regularized Neural Radiance Field (NeRF) approaches produce poor geometry and view extrapolation for large scale sparse view scenes, such as ETH3D.
1 code implementation • CVPR 2024 • Michal Shlapentokh-Rothman, Ansel Blume, Yao Xiao, Yuqun Wu, Sethuraman T V, Heyi Tao, Jae Yong Lee, Wilfredo Torres, Yu-Xiong Wang, Derek Hoiem
We investigate whether region-based representations are effective for recognition.
no code implementations • CVPR 2024 • Jiasen Lu, Christopher Clark, Sangho Lee, Zichen Zhang, Savya Khosla, Ryan Marten, Derek Hoiem, Aniruddha Kembhavi
We present Unified-IO 2 a multimodal and multi-skill unified model capable of following novel instructions.
1 code implementation • 28 Dec 2023 • Jiasen Lu, Christopher Clark, Sangho Lee, Zichen Zhang, Savya Khosla, Ryan Marten, Derek Hoiem, Aniruddha Kembhavi
We present Unified-IO 2, the first autoregressive multimodal model that is capable of understanding and generating image, text, audio, and action.
1 code implementation • 22 Nov 2023 • Yangyi Chen, Xingyao Wang, Manling Li, Derek Hoiem, Heng Ji
We adopt a weakly-supervised approach to directly generate visual event structures from captions for ViStruct training, capitalizing on abundant image-caption pairs from the web.
no code implementations • 24 Oct 2023 • Heyi Tao, Sethuraman T V, Michal Shlapentokh-Rothman, Derek Hoiem
The paper investigates using a Large Language Model (LLM) to automatically perform web software tasks using click, scroll, and text input operations.
1 code implementation • 4 Jul 2023 • Zhen Zhu, Weijie Lyu, Yao Xiao, Derek Hoiem
We introduce a method for flexible and efficient continual learning in open-vocabulary image classification, drawing inspiration from the complementary learning systems observed in human cognition.
no code implementations • 4 Jul 2023 • Zhen Zhu, Yijun Li, Weijie Lyu, Krishna Kumar Singh, Zhixin Shu, Soeren Pirk, Derek Hoiem
We investigate how to generate multimodal image outputs, such as RGB, depth, and surface normals, with a single generative model.
no code implementations • 27 Apr 2023 • Anand Bhattad, Viraj Shah, Derek Hoiem, D. A. Forsyth
StyleGAN's disentangled style representation enables powerful image editing by manipulating the latent variables, but accurately mapping real-world images to their latent variables (GAN inversion) remains a challenge.
no code implementations • 2 Dec 2022 • Jae Yong Lee, Yuqun Wu, Chuhang Zou, Shenlong Wang, Derek Hoiem
Instead, we propose to encode features in bins of Fourier features that are commonly used for positional encoding.
no code implementations • 2 Dec 2022 • Yuqun Wu, Jae Yong Lee, Derek Hoiem
Our long term goal is to use image-based depth completion to quickly create 3D models from sparse point clouds, e. g. from SfM or SLAM.
no code implementations • 14 Oct 2022 • Jae Yong Lee, Chuhang Zou, Derek Hoiem
Recent work in multi-view stereo (MVS) combines learnable photometric scores and regularization with PatchMatch-based optimization to achieve robust pixelwise estimates of depth, normals, and visibility.
1 code implementation • 22 May 2022 • Zhenhailong Wang, Manling Li, Ruochen Xu, Luowei Zhou, Jie Lei, Xudong Lin, Shuohang Wang, ZiYi Yang, Chenguang Zhu, Derek Hoiem, Shih-Fu Chang, Mohit Bansal, Heng Ji
The goal of this work is to build flexible video-language models that can generalize to various video-to-text tasks from few examples, such as domain-specific captioning, question answering, and future event prediction.
1 code implementation • 28 Apr 2022 • Tanmay Gupta, Ryan Marten, Aniruddha Kembhavi, Derek Hoiem
Computer vision models excel at making predictions when the test distribution closely resembles the training distribution.
no code implementations • 4 Feb 2022 • Amita Kamath, Christopher Clark, Tanmay Gupta, Eric Kolve, Derek Hoiem, Aniruddha Kembhavi
This work presents an effective and inexpensive alternative: learn skills from supervised datasets, learn concepts from web image search, and leverage a key characteristic of GPVs: the ability to transfer visual knowledge across skills.
Ranked #2 on
Visual Question Answering (VQA)
on GRIT
no code implementations • CVPR 2022 • Tanmay Gupta, Amita Kamath, Aniruddha Kembhavi, Derek Hoiem
To reduce the time and expertise required to develop new applications, we would like to create general purpose vision systems that can learn and perform a range of tasks without any modification to the architecture or learning process.
1 code implementation • ICCV 2021 • Jae Yong Lee, Joseph DeGol, Chuhang Zou, Derek Hoiem
To overcome the challenge of the non-differentiable PatchMatch optimization that involves iterative sampling and hard decisions, we use reinforcement learning to minimize expected photometric cost and maximize likelihood of ground truth depth and normals.
2 code implementations • 1 Apr 2021 • Tanmay Gupta, Amita Kamath, Aniruddha Kembhavi, Derek Hoiem
To reduce the time and expertise required to develop new applications, we would like to create general purpose vision systems that can learn and perform a range of tasks without any modification to the architecture or learning process.
1 code implementation • 21 Oct 2020 • Derek Hoiem, Tanmay Gupta, Zhizhong Li, Michal M. Shlapentokh-Rothman
Learning curves model a classifier's test error as a function of the number of training samples.
1 code implementation • ECCV 2020 • Tanmay Gupta, Arash Vahdat, Gal Chechik, Xiaodong Yang, Jan Kautz, Derek Hoiem
Given pairs of images and captions, we maximize compatibility of the attention-weighted regions and the words in the corresponding caption, compared to non-corresponding pairs of images and captions.
no code implementations • CVPR 2013 • Kevin Karsch, Zicheng Liao, Jason Rock, Jonathan T. Barron, Derek Hoiem
Early work in computer vision considered a host of geometric cues for both shape reconstruction and recognition.
2 code implementations • CVPR 2020 • Hongxu Yin, Pavlo Molchanov, Zhizhong Li, Jose M. Alvarez, Arun Mallya, Derek Hoiem, Niraj K. Jha, Jan Kautz
We introduce DeepInversion, a new method for synthesizing images from the image distribution used to train a deep neural network.
3 code implementations • 9 Oct 2019 • Chuhang Zou, Jheng-Wei Su, Chi-Han Peng, Alex Colburn, Qi Shan, Peter Wonka, Hung-Kuo Chu, Derek Hoiem
Recent approaches for predicting layouts from 360 panoramas produce excellent results.
1 code implementation • ICCV 2019 • Tanmay Gupta, Alexander Schwing, Derek Hoiem
Through unsupervised clustering, supervised partitioning, and a zero-shot-like generalization analysis we show that our word embeddings complement text-only embeddings like GloVe by better representing similarities and differences between visual concepts that are difficult to obtain from text corpora alone.
no code implementations • 16 Aug 2019 • Zhizhong Li, Linjie Luo, Sergey Tulyakov, Qieyun Dai, Derek Hoiem
Our key idea to improve domain adaptation is to introduce a separate anchor task (such as facial landmarks) whose annotations can be obtained at no cost or are already available on both synthetic and real datasets.
1 code implementation • 29 Jul 2019 • Chuhang Zou, Derek Hoiem
One major challenge in 3D reconstruction is to infer the complete shape geometry from partial foreground occlusions.
no code implementations • ICLR 2019 • Zhizhong Li, Derek Hoiem
We compare a number of methods from related fields such as calibration and epistemic uncertainty modeling, as well as two proposed methods that reduce overconfident errors of samples from an unknown novel distribution without drastically increasing evaluation time: (1) G-distillation, training an ensemble of classifiers and then distill into a single model using both labeled and unlabeled examples, or (2) NCR, reducing prediction confidence based on its novelty detection score.
3 code implementations • ICCV 2019 • Tanmay Gupta, Alexander Schwing, Derek Hoiem
We show that for human-object interaction detection a relatively simple factorized model with appearance and layout encodings constructed from pre-trained object detectors outperforms more sophisticated approaches.
no code implementations • ECCV 2018 • Joseph DeGol, Timothy Bretl, Derek Hoiem
In this paper, we present an incremental structure from motion (SfM) algorithm that signiï¬cantly outperforms existing algorithms when ï¬ducial markers are present in the scene, and that matches the performance of existing algorithms when no markers are present.
no code implementations • CVPR 2018 • Daeyun Shin, Charless C. Fowlkes, Derek Hoiem
The goal of this paper is to compare surface-based and volumetric 3D object shape representations, as well as viewer-centered and object-centered reference frames for single-view 3D shape prediction.
5 code implementations • ECCV 2018 • Tanmay Gupta, Dustin Schwenk, Ali Farhadi, Derek Hoiem, Aniruddha Kembhavi
Imagining a scene described in natural language with realistic layout and appearance of entities is the ultimate test of spatial, visual, and semantic world knowledge.
1 code implementation • CVPR 2020 • Zhizhong Li, Derek Hoiem
In this paper, we compare and evaluate several methods to improve confidence estimates for unfamiliar and familiar samples.
2 code implementations • CVPR 2018 • Chuhang Zou, Alex Colburn, Qi Shan, Derek Hoiem
We propose an algorithm to predict room layout from a single image that generalizes across panoramas and perspective images, cuboid layouts and more general layouts (e. g. L-shape room).
1 code implementation • 25 Oct 2017 • Chuhang Zou, Ruiqi Guo, Zhizhong Li, Derek Hoiem
In this paper, we aim to interpret indoor scenes from one RGBD image.
no code implementations • ICCV 2017 • Joseph DeGol, Timothy Bretl, Derek Hoiem
Current fiducial marker detection algorithms rely on marker IDs for false positive rejection.
2 code implementations • ICCV 2017 • Chuhang Zou, Ersin Yumer, Jimei Yang, Duygu Ceylan, Derek Hoiem
The success of various applications including robotics, digital content creation, and visualization demand a structured and abstract representation of the 3D world from limited sensor data.
no code implementations • ICCV 2017 • Tanmay Gupta, Kevin Shih, Saurabh Singh, Derek Hoiem
In this paper, we investigate a vision-language embedding as a core representation and show that it leads to better cross-task transfer than standard multi-task learning.
no code implementations • CVPR 2016 • Joseph DeGol, Mani Golparvar-Fard, Derek Hoiem
Our goal is to recognize material categories using images and geometry information.
12 code implementations • 29 Jun 2016 • Zhizhong Li, Derek Hoiem
We propose our Learning without Forgetting method, which uses only new task data to train the network while preserving the original capabilities.
Ranked #4 on
Domain 11-5
on Cityscapes
no code implementations • 15 Jun 2016 • Tanmay Gupta, Daeyun Shin, Naren Sivagnanadasan, Derek Hoiem
The resulting depth maps are then fused using a proposed implicit surface function that is robust to estimation error, producing a smooth surface reconstruction of the entire scene.
no code implementations • CVPR 2016 • Saurabh Singh, Derek Hoiem, David Forsyth
We describe a method to find such landmarks by finding a sequence of latent landmarks, each with a prediction model.
no code implementations • NeurIPS 2016 • Saurabh Singh, Derek Hoiem, David Forsyth
When viewed as a regularization method swapout not only inhibits co-adaptation of units in a layer, similar to dropout, but also across network layers.
no code implementations • CVPR 2016 • Kevin J. Shih, Saurabh Singh, Derek Hoiem
We present a method that learns to answer visual questions by selecting image regions relevant to the text-based query.
no code implementations • 22 Jul 2015 • Kevin J. Shih, Arun Mallya, Saurabh Singh, Derek Hoiem
We present a simple deep learning framework to simultaneously predict keypoint locations and their respective visibilities and use those to achieve state-of-the-art performance for fine-grained classification.
no code implementations • CVPR 2015 • Jason Rock, Tanmay Gupta, Justin Thorsen, JunYoung Gwak, Daeyun Shin, Derek Hoiem
Our goal is to recover a complete 3D model from a depth image of an object.
no code implementations • CVPR 2015 • Saurabh Singh, Derek Hoiem, David Forsyth
We propose a general method to find landmarks in images of objects using both appearance and spatial context.
1 code implementation • 9 Apr 2015 • Ruiqi Guo, Chuhang Zou, Derek Hoiem
One major goal of vision is to infer physical models of objects, surfaces, and their layout from sensors.
no code implementations • CVPR 2013 • Ian Endres, Kevin J. Shih, Johnston Jiaa, Derek Hoiem
We propose a method to learn a diverse collection of discriminative parts from object bounding box annotations.