Search Results for author: Philipp Krähenbühl

Found 47 papers, 39 papers with code

Image and Video Tokenization with Binary Spherical Quantization

1 code implementation11 Jun 2024 Yue Zhao, Yuanjun Xiong, Philipp Krähenbühl

The resulting BSQ-ViT achieves state-of-the-art visual reconstruction quality on image and video reconstruction benchmarks with 2. 4$\times$ throughput compared to the best prior methods.

Decoder Image Generation +3

Language-Image Models with 3D Understanding

no code implementations6 May 2024 Jang Hyun Cho, Boris Ivanovic, Yulong Cao, Edward Schmerling, Yue Wang, Xinshuo Weng, Boyi Li, Yurong You, Philipp Krähenbühl, Yan Wang, Marco Pavone

Our experiments on outdoor benchmarks demonstrate that Cube-LLM significantly outperforms existing baselines by 21. 3 points of AP-BEV on the Talk2Car dataset for 3D grounded reasoning and 17. 7 points on the DriveLM dataset for complex reasoning about driving scenarios, respectively.

Question Answering Visual Question Answering

Predicting a Protein's Stability under a Million Mutations

1 code implementation NeurIPS 2023 Jeffrey Ouyang-Zhang, Daniel J. Diaz, Adam R. Klivans, Philipp Krähenbühl

We build Mutate Everything on top of ESM2 and AlphaFold, neither of which were trained to predict thermodynamic stability.

PartDistillation: Learning Parts From Instance Segmentation

1 code implementation CVPR 2023 Jang Hyun Cho, Philipp Krähenbühl, Vignesh Ramanathan

PartDistillation transfers the part information of an instance segmentation model into a part segmentation model through self-supervised self-training on a large dataset.

Instance Segmentation Object +3

NMS Strikes Back

1 code implementation12 Dec 2022 Jeffrey Ouyang-Zhang, Jang Hyun Cho, Xingyi Zhou, Philipp Krähenbühl

Our detector that trains Deformable-DETR with traditional IoU-based label assignment achieved 50. 2 COCO mAP within 12 epochs (1x schedule) with ResNet50 backbone, outperforming all existing traditional or transformer-based detectors in this setting.

Attribute object-detection +1

Cross-view Transformers for real-time Map-view Semantic Segmentation

2 code implementations CVPR 2022 Brady Zhou, Philipp Krähenbühl

The architecture consists of a convolutional image encoder for each view and cross-view transformer layers to infer a map-view semantic segmentation.

Bird's-Eye View Semantic Segmentation Segmentation

Global Tracking Transformers

1 code implementation CVPR 2022 Xingyi Zhou, Tianwei Yin, Vladlen Koltun, Philipp Krähenbühl

The transformer encodes object features from all frames, and uses trajectory queries to group them into trajectories.

Ranked #14 on Multi-Object Tracking on SportsMOT (using extra training data)

Multi-Object Tracking Object

Learning from All Vehicles

1 code implementation CVPR 2022 Dian Chen, Philipp Krähenbühl

In this paper, we present a system to train driving policies from experiences collected not just from the ego-vehicle, but all vehicles that it observes.

Autonomous Driving CARLA longest6

Detecting Twenty-thousand Classes using Image-level Supervision

1 code implementation7 Jan 2022 Xingyi Zhou, Rohit Girdhar, Armand Joulin, Philipp Krähenbühl, Ishan Misra

For the first time, we train a detector with all the twenty-one-thousand classes of the ImageNet dataset and show that it generalizes to new datasets without finetuning.

Image Classification Open Vocabulary Object Detection

Multimodal Virtual Point 3D Detection

1 code implementation NeurIPS 2021 Tianwei Yin, Xingyi Zhou, Philipp Krähenbühl

For autonomous driving, this means that large objects close to the sensors are easily visible, but far-away or small objects comprise only one measurement or two.

3D Object Detection Autonomous Driving

Towards Long-Form Video Understanding

2 code implementations CVPR 2021 Chao-yuan Wu, Philipp Krähenbühl

Our world offers a never-ending stream of visual stimuli, yet today's vision systems only accurately recognize patterns within a few seconds.

Action Recognition Video Recognition +1

Learning to drive from a world on rails

1 code implementation ICCV 2021 Dian Chen, Vladlen Koltun, Philipp Krähenbühl

This assumption greatly simplifies the learning problem, factorizing the dynamics into a nonreactive world model and a low-dimensional and compact forward model of the ego-vehicle.

Autonomous Driving CARLA longest6 +1

Domain Adaptation Through Task Distillation

1 code implementation27 Aug 2020 Brady Zhou, Nimit Kalra, Philipp Krähenbühl

We use these recognition datasets to link up a source and target domain to transfer models between them in a task distillation framework.

Autonomous Driving Domain Adaptation

Tracking Objects as Points

7 code implementations ECCV 2020 Xingyi Zhou, Vladlen Koltun, Philipp Krähenbühl

Nowadays, tracking is dominated by pipelines that perform object detection followed by temporal association, also known as tracking-by-detection.

Multi-Object Tracking Multiple Object Tracking +2

A Multigrid Method for Efficiently Training Video Models

3 code implementations CVPR 2020 Chao-yuan Wu, Ross Girshick, Kaiming He, Christoph Feichtenhofer, Philipp Krähenbühl

We empirically demonstrate a general and robust grid schedule that yields a significant out-of-the-box training speedup without a loss in accuracy for different models (I3D, non-local, SlowFast), datasets (Kinetics, Something-Something, Charades), and training settings (with and without pre-training, 128 GPUs or 1 GPU).

Action Detection Action Recognition +2

Does computer vision matter for action?

no code implementations30 May 2019 Brady Zhou, Philipp Krähenbühl, Vladlen Koltun

Thus the central question of our work: Does computer vision matter for action?

Monocular Plan View Networks for Autonomous Driving

no code implementations16 May 2019 Dequan Wang, Coline Devin, Qi-Zhi Cai, Philipp Krähenbühl, Trevor Darrell

Convolutions on monocular dash cam videos capture spatial invariances in the image plane but do not explicitly reason about distances and depth.

3D Object Detection Autonomous Driving +1

Don't let your Discriminator be fooled

no code implementations ICLR 2019 Brady Zhou, Philipp Krähenbühl

We experimentally show that any GAN objective, including Wasserstein GANs, benefit from adversarial robustness both quantitatively and qualitatively.

Adversarial Robustness

Joint Monocular 3D Vehicle Detection and Tracking

1 code implementation ICCV 2019 Hou-Ning Hu, Qi-Zhi Cai, Dequan Wang, Ji Lin, Min Sun, Philipp Krähenbühl, Trevor Darrell, Fisher Yu

The framework can not only associate detections of vehicles in motion over time, but also estimate their complete 3D bounding box information from a sequence of 2D images captured on a moving platform.

3D Object Detection 3D Pose Estimation +4

Video Compression through Image Interpolation

1 code implementation ECCV 2018 Chao-yuan Wu, Nayan Singhal, Philipp Krähenbühl

An ever increasing amount of our digital communication, media consumption, and content creation revolves around videos.

Video Compression

Generative Visual Manipulation on the Natural Image Manifold

1 code implementation12 Sep 2016 Jun-Yan Zhu, Philipp Krähenbühl, Eli Shechtman, Alexei A. Efros

Realistic image manipulation is challenging because it requires modifying the image appearance in a user-controlled way, while preserving the realism of the result.

Image Manipulation

Adversarial Feature Learning

10 code implementations31 May 2016 Jeff Donahue, Philipp Krähenbühl, Trevor Darrell

The ability of the Generative Adversarial Networks (GANs) framework to learn generative models mapping from simple latent distributions to arbitrarily complex data distributions has been demonstrated empirically, with compelling results showing that the latent space of such generators captures semantic variation in the data distribution.

Learning Dense Correspondence via 3D-guided Cycle Consistency

no code implementations CVPR 2016 Tinghui Zhou, Philipp Krähenbühl, Mathieu Aubry, Qi-Xing Huang, Alexei A. Efros

We use ground-truth synthetic-to-synthetic correspondences, provided by the rendering engine, to train a ConvNet to predict synthetic-to-real, real-to-real and real-to-synthetic correspondences that are cycle-consistent with the ground-truth.

Data-dependent Initializations of Convolutional Neural Networks

2 code implementations21 Nov 2015 Philipp Krähenbühl, Carl Doersch, Jeff Donahue, Trevor Darrell

Convolutional Neural Networks spread through computer vision like a wildfire, impacting almost all visual tasks imaginable.

Image Classification object-detection +2

Learning Data-driven Reflectance Priors for Intrinsic Image Decomposition

no code implementations ICCV 2015 Tinghui Zhou, Philipp Krähenbühl, Alexei A. Efros

We propose a data-driven approach for intrinsic image decomposition, which is the process of inferring the confounding factors of reflectance and shading in an image.

Image Relighting Intrinsic Image Decomposition

Constrained Convolutional Neural Networks for Weakly Supervised Segmentation

1 code implementation ICCV 2015 Deepak Pathak, Philipp Krähenbühl, Trevor Darrell

We propose Constrained CNN (CCNN), a method which uses a novel loss function to optimize for any set of linear constraints on the output space (i. e. predicted label distribution) of a CNN.

Image Segmentation Semantic Segmentation +2

Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials

3 code implementations20 Oct 2012 Philipp Krähenbühl, Vladlen Koltun

In this paper, we consider fully connected CRF models defined on the complete set of pixels in an image.

Image Segmentation Segmentation +1

Cannot find the paper you are looking for? You can Submit a new open access paper.