Search Results for author: Justin Johnson

Found 52 papers, 35 papers with code

Probing the 3D Awareness of Visual Foundation Models

1 code implementation12 Apr 2024 Mohamed El Banani, Amit Raj, Kevis-Kokitsi Maninis, Abhishek Kar, Yuanzhen Li, Michael Rubinstein, Deqing Sun, Leonidas Guibas, Justin Johnson, Varun Jampani

Given that such models can classify, delineate, and localize objects in 2D, we ask whether they also represent their 3D structure?

View Selection for 3D Captioning via Diffusion Ranking

no code implementations11 Apr 2024 Tiange Luo, Justin Johnson, Honglak Lee

Scalable annotation approaches are crucial for constructing extensive 3D-text datasets, facilitating a broader range of applications.

Hallucination Image Captioning +3

NIFTY: Neural Object Interaction Fields for Guided Human Motion Synthesis

no code implementations14 Jul 2023 Nilesh Kulkarni, Davis Rempe, Kyle Genova, Abhijit Kundu, Justin Johnson, David Fouhey, Leonidas Guibas

This interaction field guides the sampling of an object-conditioned human motion diffusion model, so as to encourage plausible contacts and affordance semantics.

Motion Synthesis valid

Learning to Predict Scene-Level Implicit 3D from Posed RGBD Data

no code implementations CVPR 2023 Nilesh Kulkarni, Linyi Jin, Justin Johnson, David F. Fouhey

We introduce a method that can learn to predict scene-level implicit functions for 3D reconstruction from posed RGBD data.

3D Reconstruction

Hyperbolic Image-Text Representations

1 code implementation18 Apr 2023 Karan Desai, Maximilian Nickel, Tanmay Rajpurohit, Justin Johnson, Ramakrishna Vedantam

Visual and linguistic concepts naturally organize themselves in a hierarchy, where a textual concept "dog" entails all images that contain dogs.

Image Classification Retrieval +1

Text2Room: Extracting Textured 3D Meshes from 2D Text-to-Image Models

1 code implementation ICCV 2023 Lukas Höllein, Ang Cao, Andrew Owens, Justin Johnson, Matthias Nießner

We present Text2Room, a method for generating room-scale textured 3D meshes from a given text prompt as input.

Text to 3D

Learning Visual Representations via Language-Guided Sampling

1 code implementation CVPR 2023 Mohamed El Banani, Karan Desai, Justin Johnson

Our approach diverges from image-based contrastive learning by sampling view pairs using language similarity instead of hand-crafted augmentations or learned clusters.

Contrastive Learning Representation Learning

Text-To-4D Dynamic Scene Generation

no code implementations26 Jan 2023 Uriel Singer, Shelly Sheynin, Adam Polyak, Oron Ashual, Iurii Makarov, Filippos Kokkinos, Naman Goyal, Andrea Vedaldi, Devi Parikh, Justin Johnson, Yaniv Taigman

We present MAV3D (Make-A-Video3D), a method for generating three-dimensional dynamic scenes from text descriptions.

Scene Generation

HexPlane: A Fast Representation for Dynamic Scenes

1 code implementation CVPR 2023 Ang Cao, Justin Johnson

HexPlane is a simple and effective solution for representing 4D volumes, and we hope they can broadly contribute to modeling spacetime for dynamic 3D scenes.

Novel View Synthesis

Multiview Compressive Coding for 3D Reconstruction

1 code implementation CVPR 2023 Chao-yuan Wu, Justin Johnson, Jitendra Malik, Christoph Feichtenhofer, Georgia Gkioxari

We introduce a simple framework that operates on 3D points of single objects or whole scenes coupled with category-agnostic large-scale training from diverse RGB-D videos.

3D Reconstruction Self-Supervised Learning +1

Neural Shape Compiler: A Unified Framework for Transforming between Text, Point Cloud, and Program

no code implementations25 Dec 2022 Tiange Luo, Honglak Lee, Justin Johnson

On Text2Shape, ShapeGlot, ABO, Genre, and Program Synthetic datasets, Neural Shape Compiler shows strengths in $\textit{Text}$ $\Longrightarrow$ $\textit{Point Cloud}$, $\textit{Point Cloud}$ $\Longrightarrow$ $\textit{Text}$, $\textit{Point Cloud}$ $\Longrightarrow$ $\textit{Program}$, and Point Cloud Completion tasks.

Point Cloud Completion

Self-Supervised Correspondence Estimation via Multiview Registration

1 code implementation6 Dec 2022 Mohamed El Banani, Ignacio Rocco, David Novotny, Andrea Vedaldi, Natalia Neverova, Justin Johnson, Benjamin Graham

To address this, we propose a self-supervised approach for correspondence estimation that learns from multiview consistency in short RGB-D video sequences.

RGB no more: Minimally-decoded JPEG Vision Transformers

1 code implementation CVPR 2023 Jeongsoo Park, Justin Johnson

However, these RGB images are commonly encoded in JPEG before saving to disk; decoding them imposes an unavoidable overhead for RGB networks.

Data Augmentation

The 8-Point Algorithm as an Inductive Bias for Relative Pose Prediction by ViTs

no code implementations18 Aug 2022 Chris Rockwell, Justin Johnson, David F. Fouhey

We present a simple baseline for directly estimating the relative pose (rotation and translation, including scale) between two images.

Inductive Bias Pose Prediction +1

FWD: Real-time Novel View Synthesis with Forward Warping and Depth

1 code implementation CVPR 2022 Ang Cao, Chris Rockwell, Justin Johnson

Novel view synthesis (NVS) is a challenging task requiring systems to generate photorealistic images of scenes from new viewpoints, where both quality and speed are important for applications.

Novel View Synthesis

Learning 3D Object Shape and Layout without 3D Supervision

no code implementations CVPR 2022 Georgia Gkioxari, Nikhila Ravi, Justin Johnson

A 3D scene consists of a set of objects, each with a shape and a layout giving their position in space.

Object

RedCaps: web-curated image-text data created by the people, for the people

1 code implementation22 Nov 2021 Karan Desai, Gaurav Kaul, Zubin Aysola, Justin Johnson

We introduce RedCaps -- a large-scale dataset of 12M image-text pairs collected from Reddit.

PixelSynth: Generating a 3D-Consistent Experience from a Single Image

1 code implementation ICCV 2021 Chris Rockwell, David F. Fouhey, Justin Johnson

Recent advancements in differentiable rendering and 3D reasoning have driven exciting results in novel view synthesis from a single image.

Novel View Synthesis

Inverting and Understanding Object Detectors

1 code implementation26 Jun 2021 Ang Cao, Justin Johnson

As a core problem in computer vision, the performance of object detection has improved drastically in the past few years.

Object object-detection +1

Bootstrap Your Own Correspondences

no code implementations ICCV 2021 Mohamed El Banani, Justin Johnson

Our approach combines classic ideas from point cloud registration with more recent representation learning approaches.

Point Cloud Registration Representation Learning

Rethinking "Batch" in BatchNorm

1 code implementation17 May 2021 Yuxin Wu, Justin Johnson

BatchNorm is a critical building block in modern convolutional neural networks.

UnsupervisedR&R: Unsupervised Point Cloud Registration via Differentiable Rendering

1 code implementation CVPR 2021 Mohamed El Banani, Luya Gao, Justin Johnson

Aligning partial views of a scene into a single whole is essential to understanding one's environment and is a key component of numerous robotics tasks such as SLAM and SfM.

Point Cloud Registration

Accelerating 3D Deep Learning with PyTorch3D

3 code implementations16 Jul 2020 Nikhila Ravi, Jeremy Reizenstein, David Novotny, Taylor Gordon, Wan-Yen Lo, Justin Johnson, Georgia Gkioxari

We address these challenges by introducing PyTorch3D, a library of modular, efficient, and differentiable operators for 3D deep learning.

Autonomous Vehicles

VirTex: Learning Visual Representations from Textual Annotations

3 code implementations CVPR 2021 Karan Desai, Justin Johnson

The de-facto approach to many vision tasks is to start from pretrained visual representations, typically learned via supervised training on ImageNet.

 Ranked #1 on Object Detection on COCO test-dev (Hardware Burden metric)

General Classification Image Captioning +5

SynSin: End-to-end View Synthesis from a Single Image

3 code implementations CVPR 2020 Olivia Wiles, Georgia Gkioxari, Richard Szeliski, Justin Johnson

Single image view synthesis allows for the generation of new views of a scene given a single input image.

Novel View Synthesis

Temporal Reasoning via Audio Question Answering

1 code implementation21 Nov 2019 Haytham M. Fayek, Justin Johnson

In this paper, we use the task of Audio Question Answering (AQA) to study the temporal reasoning abilities of machine learning models.

Audio Question Answering Question Answering +3

PHYRE: A New Benchmark for Physical Reasoning

2 code implementations NeurIPS 2019 Anton Bakhtin, Laurens van der Maaten, Justin Johnson, Laura Gustafson, Ross Girshick

The benchmark is designed to encourage the development of learning algorithms that are sample-efficient and generalize well across puzzles.

Visual Reasoning

Mesh R-CNN

6 code implementations ICCV 2019 Georgia Gkioxari, Jitendra Malik, Justin Johnson

We propose a system that detects objects in real-world images and produces a triangle mesh giving the full 3D shape of each detected object.

3D Shape Modeling

On Network Design Spaces for Visual Recognition

4 code implementations ICCV 2019 Ilija Radosavovic, Justin Johnson, Saining Xie, Wan-Yen Lo, Piotr Dollár

Compared to current methodologies of comparing point and curve estimates of model families, distribution estimates paint a more complete picture of the entire design landscape.

Neural Architecture Search

HiDDeN: Hiding Data With Deep Networks

6 code implementations ECCV 2018 Jiren Zhu, Russell Kaplan, Justin Johnson, Li Fei-Fei

We show that these encodings are competitive with existing data hiding algorithms, and further that they can be made robust to noise: our models learn to reconstruct hidden information in an encoded image despite the presence of Gaussian blurring, pixel-wise dropout, cropping, and JPEG compression.

Image Generation from Scene Graphs

4 code implementations CVPR 2018 Justin Johnson, Agrim Gupta, Li Fei-Fei

To overcome this limitation we propose a method for generating images from scene graphs, enabling explicitly reasoning about objects and their relationships.

Image Generation from Scene Graphs Layout-to-Image Generation

DDRprog: A CLEVR Differentiable Dynamic Reasoning Programmer

no code implementations ICLR 2018 Joseph Suarez, Justin Johnson, Fei-Fei Li

We present a novel Dynamic Differentiable Reasoning (DDR) framework for jointly learning branching programs and the functions composing them; this resolves a significant nondifferentiability inhibiting recent dynamic architectures.

Question Answering Visual Question Answering

Social GAN: Socially Acceptable Trajectories with Generative Adversarial Networks

7 code implementations CVPR 2018 Agrim Gupta, Justin Johnson, Li Fei-Fei, Silvio Savarese, Alexandre Alahi

Understanding human motion behavior is critical for autonomous moving platforms (like self-driving cars and social robots) if they are to navigate human-centric environments.

Collision Avoidance Motion Forecasting +4

Inferring and Executing Programs for Visual Reasoning

5 code implementations ICCV 2017 Justin Johnson, Bharath Hariharan, Laurens van der Maaten, Judy Hoffman, Li Fei-Fei, C. Lawrence Zitnick, Ross Girshick

Existing methods for visual reasoning attempt to directly map inputs to outputs using black-box architectures without explicitly modeling the underlying reasoning processes.

Visual Question Answering (VQA) Visual Reasoning

CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning

5 code implementations CVPR 2017 Justin Johnson, Bharath Hariharan, Laurens van der Maaten, Li Fei-Fei, C. Lawrence Zitnick, Ross Girshick

When building artificial intelligence systems that can reason and answer questions about visual data, we need diagnostic tests to analyze our progress and discover shortcomings.

Question Answering Visual Question Answering +1

A Hierarchical Approach for Generating Descriptive Image Paragraphs

3 code implementations CVPR 2017 Jonathan Krause, Justin Johnson, Ranjay Krishna, Li Fei-Fei

Recent progress on image captioning has made it possible to generate novel sentences describing images in natural language, but compressing an image into a single sentence can describe visual content in only coarse detail.

Dense Captioning Descriptive +3

DenseCap: Fully Convolutional Localization Networks for Dense Captioning

1 code implementation CVPR 2016 Justin Johnson, Andrej Karpathy, Li Fei-Fei

We introduce the dense captioning task, which requires a computer vision system to both localize and describe salient regions in images in natural language.

Dense Captioning Image Captioning +4

Love Thy Neighbors: Image Annotation by Exploiting Image Metadata

no code implementations ICCV 2015 Justin Johnson, Lamberto Ballan, Fei-Fei Li

Some images that are difficult to recognize on their own may become more clear in the context of a neighborhood of related images with similar social-network metadata.

Visualizing and Understanding Recurrent Networks

3 code implementations5 Jun 2015 Andrej Karpathy, Justin Johnson, Li Fei-Fei

Recurrent Neural Networks (RNNs), and specifically a variant with Long Short-Term Memory (LSTM), are enjoying renewed interest as a result of successful applications in a wide range of machine learning problems that involve sequential data.

Image Retrieval Using Scene Graphs

no code implementations CVPR 2015 Justin Johnson, Ranjay Krishna, Michael Stark, Li-Jia Li, David Shamma, Michael Bernstein, Li Fei-Fei

We introduce a novel dataset of 5, 000 human-generated scene graphs grounded to images and use this dataset to evaluate our method for image retrieval.

Image Retrieval Object Localization +1

Cannot find the paper you are looking for? You can Submit a new open access paper.