no code implementations • 31 Dec 2024 • Tiange Luo, Ang Cao, GunHee Lee, Justin Johnson, Honglak Lee
Despite recent advances in Vision-Language Models (VLMs), many still over-rely on visual language priors present in their training data rather than true visual reasoning.
1 code implementation • 30 Apr 2024 • Ang Cao, Justin Johnson, Andrea Vedaldi, David Novotny
Contemporary 3D research, particularly in reconstruction and generation, heavily relies on 2D images for inputs or supervision.
1 code implementation • CVPR 2024 • Mohamed El Banani, Amit Raj, Kevis-Kokitsi Maninis, Abhishek Kar, Yuanzhen Li, Michael Rubinstein, Deqing Sun, Leonidas Guibas, Justin Johnson, Varun Jampani
Given that such models can classify, delineate, and localize objects in 2D, we ask whether they also represent their 3D structure?
1 code implementation • 11 Apr 2024 • Tiange Luo, Justin Johnson, Honglak Lee
Scalable annotation approaches are crucial for constructing extensive 3D-text datasets, facilitating a broader range of applications.
no code implementations • CVPR 2024 • Zixuan Huang, Justin Johnson, Shoubhik Debnath, James M. Rehg, Chao-yuan Wu
We present PointInfinity, an efficient family of point cloud diffusion models.
1 code implementation • 27 Mar 2024 • Shweta Singh, Aayan Yadav, Jitesh Jain, Humphrey Shi, Justin Johnson, Karan Desai
With these findings, we advocate using COCO-ReM for future object detection research.
no code implementations • 5 Mar 2024 • Chris Rockwell, Nilesh Kulkarni, Linyi Jin, Jeong Joon Park, Justin Johnson, David F. Fouhey
Estimating relative camera poses between images has been a central problem in computer vision.
no code implementations • CVPR 2024 • Chris Rockwell, Nilesh Kulkarni, Linyi Jin, Jeong Joon Park, Justin Johnson, David F. Fouhey
Estimating relative camera poses between images has been a central problem in computer vision.
no code implementations • CVPR 2024 • Nilesh Kulkarni, Davis Rempe, Kyle Genova, Abhijit Kundu, Justin Johnson, David Fouhey, Leonidas Guibas
This interaction field guides the sampling of an object-conditioned human motion diffusion model, so as to encourage plausible contacts and affordance semantics.
no code implementations • CVPR 2023 • Nilesh Kulkarni, Linyi Jin, Justin Johnson, David F. Fouhey
We introduce a method that can learn to predict scene-level implicit functions for 3D reconstruction from posed RGBD data.
1 code implementation • NeurIPS 2023 • Tiange Luo, Chris Rockwell, Honglak Lee, Justin Johnson
We introduce Cap3D, an automatic approach for generating descriptive text for 3D objects.
2 code implementations • 18 Apr 2023 • Karan Desai, Maximilian Nickel, Tanmay Rajpurohit, Justin Johnson, Ramakrishna Vedantam
Visual and linguistic concepts naturally organize themselves in a hierarchy, where a textual concept "dog" entails all images that contain dogs.
1 code implementation • ICCV 2023 • Lukas Höllein, Ang Cao, Andrew Owens, Justin Johnson, Matthias Nießner
We present Text2Room, a method for generating room-scale textured 3D meshes from a given text prompt as input.
1 code implementation • CVPR 2023 • Mohamed El Banani, Karan Desai, Justin Johnson
Our approach diverges from image-based contrastive learning by sampling view pairs using language similarity instead of hand-crafted augmentations or learned clusters.
no code implementations • 26 Jan 2023 • Uriel Singer, Shelly Sheynin, Adam Polyak, Oron Ashual, Iurii Makarov, Filippos Kokkinos, Naman Goyal, Andrea Vedaldi, Devi Parikh, Justin Johnson, Yaniv Taigman
We present MAV3D (Make-A-Video3D), a method for generating three-dimensional dynamic scenes from text descriptions.
1 code implementation • CVPR 2023 • Ang Cao, Justin Johnson
HexPlane is a simple and effective solution for representing 4D volumes, and we hope they can broadly contribute to modeling spacetime for dynamic 3D scenes.
1 code implementation • CVPR 2023 • Chao-yuan Wu, Justin Johnson, Jitendra Malik, Christoph Feichtenhofer, Georgia Gkioxari
We introduce a simple framework that operates on 3D points of single objects or whole scenes coupled with category-agnostic large-scale training from diverse RGB-D videos.
no code implementations • 25 Dec 2022 • Tiange Luo, Honglak Lee, Justin Johnson
On Text2Shape, ShapeGlot, ABO, Genre, and Program Synthetic datasets, Neural Shape Compiler shows strengths in $\textit{Text}$ $\Longrightarrow$ $\textit{Point Cloud}$, $\textit{Point Cloud}$ $\Longrightarrow$ $\textit{Text}$, $\textit{Point Cloud}$ $\Longrightarrow$ $\textit{Program}$, and Point Cloud Completion tasks.
1 code implementation • 6 Dec 2022 • Mohamed El Banani, Ignacio Rocco, David Novotny, Andrea Vedaldi, Natalia Neverova, Justin Johnson, Benjamin Graham
To address this, we propose a self-supervised approach for correspondence estimation that learns from multiview consistency in short RGB-D video sequences.
1 code implementation • CVPR 2023 • Jeongsoo Park, Justin Johnson
However, these RGB images are commonly encoded in JPEG before saving to disk; decoding them imposes an unavoidable overhead for RGB networks.
no code implementations • 18 Aug 2022 • Chris Rockwell, Justin Johnson, David F. Fouhey
We present a simple baseline for directly estimating the relative pose (rotation and translation, including scale) between two images.
1 code implementation • CVPR 2023 • Garrick Brazil, Abhinav Kumar, Julian Straub, Nikhila Ravi, Justin Johnson, Georgia Gkioxari
In 3D, existing benchmarks are small in size and approaches specialize in few object categories and specific domains, e. g. urban driving scenes.
3D Object Detection 3D Object Detection From Monocular Images +3
1 code implementation • CVPR 2022 • Ang Cao, Chris Rockwell, Justin Johnson
Novel view synthesis (NVS) is a challenging task requiring systems to generate photorealistic images of scenes from new viewpoints, where both quality and speed are important for applications.
no code implementations • CVPR 2022 • Georgia Gkioxari, Nikhila Ravi, Justin Johnson
A 3D scene consists of a set of objects, each with a shape and a layout giving their position in space.
no code implementations • 8 Dec 2021 • Nilesh Kulkarni, Justin Johnson, David F. Fouhey
We present an approach for full 3D scene reconstruction from a single unseen image.
no code implementations • 2 Dec 2021 • Shengyi Qian, Alexander Kirillov, Nikhila Ravi, Devendra Singh Chaplot, Justin Johnson, David F. Fouhey, Georgia Gkioxari
Humans can perceive scenes in 3D from a handful of 2D views.
1 code implementation • CVPR 2022 • Lukas Höllein, Justin Johnson, Matthias Nießner
Style transfer typically operates on 2D images, making stylization of a mesh challenging.
1 code implementation • 22 Nov 2021 • Karan Desai, Gaurav Kaul, Zubin Aysola, Justin Johnson
We introduce RedCaps -- a large-scale dataset of 12M image-text pairs collected from Reddit.
1 code implementation • ICCV 2021 • Chris Rockwell, David F. Fouhey, Justin Johnson
Recent advancements in differentiable rendering and 3D reasoning have driven exciting results in novel view synthesis from a single image.
1 code implementation • 26 Jun 2021 • Ang Cao, Justin Johnson
As a core problem in computer vision, the performance of object detection has improved drastically in the past few years.
no code implementations • ICCV 2021 • Mohamed El Banani, Justin Johnson
Our approach combines classic ideas from point cloud registration with more recent representation learning approaches.
1 code implementation • 17 May 2021 • Yuxin Wu, Justin Johnson
BatchNorm is a critical building block in modern convolutional neural networks.
1 code implementation • CVPR 2021 • Mohamed El Banani, Luya Gao, Justin Johnson
Aligning partial views of a scene into a single whole is essential to understanding one's environment and is a key component of numerous robotics tasks such as SLAM and SfM.
no code implementations • CVPR 2021 • Ramprasaath R. Selvaraju, Karan Desai, Justin Johnson, Nikhil Naik
Recent advances in self-supervised learning (SSL) have largely closed the gap with supervised ImageNet pretraining.
3 code implementations • 16 Jul 2020 • Nikhila Ravi, Jeremy Reizenstein, David Novotny, Taylor Gordon, Wan-Yen Lo, Justin Johnson, Georgia Gkioxari
We address these challenges by introducing PyTorch3D, a library of modular, efficient, and differentiable operators for 3D deep learning.
3 code implementations • CVPR 2021 • Karan Desai, Justin Johnson
The de-facto approach to many vision tasks is to start from pretrained visual representations, typically learned via supervised training on ImageNet.
Ranked #1 on Object Detection on COCO test-dev (Hardware Burden metric)
3 code implementations • CVPR 2020 • Olivia Wiles, Georgia Gkioxari, Richard Szeliski, Justin Johnson
Single image view synthesis allows for the generation of new views of a scene given a single input image.
1 code implementation • 21 Nov 2019 • Haytham M. Fayek, Justin Johnson
In this paper, we use the task of Audio Question Answering (AQA) to study the temporal reasoning abilities of machine learning models.
Ranked #1 on Audio Question Answering on DAQA
2 code implementations • NeurIPS 2019 • Anton Bakhtin, Laurens van der Maaten, Justin Johnson, Laura Gustafson, Ross Girshick
The benchmark is designed to encourage the development of learning algorithms that are sample-efficient and generalize well across puzzles.
Ranked #3 on Visual Reasoning on PHYRE-1B-Within
7 code implementations • ICCV 2019 • Georgia Gkioxari, Jitendra Malik, Justin Johnson
We propose a system that detects objects in real-world images and produces a triangle mesh giving the full 3D shape of each detected object.
Ranked #1 on 3D Shape Modeling on Pix3D S2
4 code implementations • ICCV 2019 • Ilija Radosavovic, Justin Johnson, Saining Xie, Wan-Yen Lo, Piotr Dollár
Compared to current methodologies of comparing point and curve estimates of model families, distribution estimates paint a more complete picture of the entire design landscape.
7 code implementations • ECCV 2018 • Jiren Zhu, Russell Kaplan, Justin Johnson, Li Fei-Fei
We show that these encodings are competitive with existing data hiding algorithms, and further that they can be made robust to noise: our models learn to reconstruct hidden information in an encoded image despite the presence of Gaussian blurring, pixel-wise dropout, cropping, and JPEG compression.
4 code implementations • CVPR 2018 • Justin Johnson, Agrim Gupta, Li Fei-Fei
To overcome this limitation we propose a method for generating images from scene graphs, enabling explicitly reasoning about objects and their relationships.
Ranked #4 on Layout-to-Image Generation on Visual Genome 64x64
Image Generation from Scene Graphs Layout-to-Image Generation
no code implementations • ICLR 2018 • Joseph Suarez, Justin Johnson, Fei-Fei Li
We present a novel Dynamic Differentiable Reasoning (DDR) framework for jointly learning branching programs and the functions composing them; this resolves a significant nondifferentiability inhibiting recent dynamic architectures.
Ranked #9 on Visual Question Answering (VQA) on CLEVR
8 code implementations • CVPR 2018 • Agrim Gupta, Justin Johnson, Li Fei-Fei, Silvio Savarese, Alexandre Alahi
Understanding human motion behavior is critical for autonomous moving platforms (like self-driving cars and social robots) if they are to navigate human-centric environments.
Ranked #4 on Trajectory Prediction on ETH
5 code implementations • ICCV 2017 • Justin Johnson, Bharath Hariharan, Laurens van der Maaten, Judy Hoffman, Li Fei-Fei, C. Lawrence Zitnick, Ross Girshick
Existing methods for visual reasoning attempt to directly map inputs to outputs using black-box architectures without explicitly modeling the underlying reasoning processes.
Ranked #5 on Visual Question Answering (VQA) on CLEVR-Humans
no code implementations • ICCV 2017 • Agrim Gupta, Justin Johnson, Alexandre Alahi, Li Fei-Fei
Recent progress in style transfer on images has focused on improving the quality of stylized images and speed of methods.
5 code implementations • CVPR 2017 • Justin Johnson, Bharath Hariharan, Laurens van der Maaten, Li Fei-Fei, C. Lawrence Zitnick, Ross Girshick
When building artificial intelligence systems that can reason and answer questions about visual data, we need diagnostic tests to analyze our progress and discover shortcomings.
3 code implementations • CVPR 2017 • Jonathan Krause, Justin Johnson, Ranjay Krishna, Li Fei-Fei
Recent progress on image captioning has made it possible to generate novel sentences describing images in natural language, but compressing an image into a single sentence can describe visual content in only coarse detail.
81 code implementations • 27 Mar 2016 • Justin Johnson, Alexandre Alahi, Li Fei-Fei
We consider image transformation problems, where an input image is transformed into an output image.
Ranked #4 on Nuclear Segmentation on Cell17
2 code implementations • 23 Feb 2016 • Ranjay Krishna, Yuke Zhu, Oliver Groth, Justin Johnson, Kenji Hata, Joshua Kravitz, Stephanie Chen, Yannis Kalantidis, Li-Jia Li, David A. Shamma, Michael S. Bernstein, Fei-Fei Li
Despite progress in perceptual tasks such as image classification, computers still perform poorly on cognitive tasks such as image description and question answering.
1 code implementation • CVPR 2016 • Justin Johnson, Andrej Karpathy, Li Fei-Fei
We introduce the dense captioning task, which requires a computer vision system to both localize and describe salient regions in images in natural language.
Ranked #3 on Object Detection on Visual Genome
no code implementations • ICCV 2015 • Justin Johnson, Lamberto Ballan, Fei-Fei Li
Some images that are difficult to recognize on their own may become more clear in the context of a neighborhood of related images with similar social-network metadata.
3 code implementations • 5 Jun 2015 • Andrej Karpathy, Justin Johnson, Li Fei-Fei
Recurrent Neural Networks (RNNs), and specifically a variant with Long Short-Term Memory (LSTM), are enjoying renewed interest as a result of successful applications in a wide range of machine learning problems that involve sequential data.
no code implementations • CVPR 2015 • Justin Johnson, Ranjay Krishna, Michael Stark, Li-Jia Li, David Shamma, Michael Bernstein, Li Fei-Fei
We introduce a novel dataset of 5, 000 human-generated scene graphs grounded to images and use this dataset to evaluate our method for image retrieval.