1 code implementation • 7 Mar 2023 • Nick Bührer, Zhejun Zhang, Alexander Liniger, Fisher Yu, Luc van Gool
To this end, we propose a safe model-free RL algorithm with a novel multiplicative value function consisting of a safety critic and a reward critic.
1 code implementation • 7 Mar 2023 • Zhejun Zhang, Alexander Liniger, Dengxin Dai, Fisher Yu, Luc van Gool
We present TrafficBots, a multi-agent policy built upon motion prediction and end-to-end driving, and based on TrafficBots we obtain a world model tailored for the planning module of autonomous vehicles.
no code implementations • 10 Feb 2023 • Mostafa Dehghani, Josip Djolonga, Basil Mustafa, Piotr Padlewski, Jonathan Heek, Justin Gilmer, Andreas Steiner, Mathilde Caron, Robert Geirhos, Ibrahim Alabdulmohsin, Rodolphe Jenatton, Lucas Beyer, Michael Tschannen, Anurag Arnab, Xiao Wang, Carlos Riquelme, Matthias Minderer, Joan Puigcerver, Utku Evci, Manoj Kumar, Sjoerd van Steenkiste, Gamaleldin F. Elsayed, Aravindh Mahendran, Fisher Yu, Avital Oliver, Fantine Huot, Jasmijn Bastings, Mark Patrick Collier, Alexey Gritsenko, Vighnesh Birodkar, Cristina Vasconcelos, Yi Tay, Thomas Mensink, Alexander Kolesnikov, Filip Pavetić, Dustin Tran, Thomas Kipf, Mario Lučić, Xiaohua Zhai, Daniel Keysers, Jeremiah Harmsen, Neil Houlsby
The scaling of Transformers has driven breakthrough capabilities for language models.
Ranked #1 on
Zero-Shot Transfer Image Classification
on ObjectNet
no code implementations • 1 Feb 2023 • Weirong Chen, Suryansh Kumar, Fisher Yu
This work introduces an effective and practical solution to the dense two-view structure from motion (SfM) problem.
no code implementations • 26 Jan 2023 • Haotong Qin, Mingyuan Zhang, Yifu Ding, Aoyu Li, Zhongang Cai, Ziwei Liu, Fisher Yu, Xianglong Liu
To close this gap, we present BiBench, a rigorously designed benchmark with in-depth analysis for network binarization.
no code implementations • 2 Dec 2022 • Tobias Fischer, Yung-Hsu Yang, Suryansh Kumar, Min Sun, Fisher Yu
To track the 3D locations and trajectories of the other traffic participants at any given time, modern autonomous vehicles are equipped with multiple cameras that cover the vehicle's full surroundings.
no code implementations • 27 Nov 2022 • Changyong Shu, Jiajun Deng, Fisher Yu, Yifan Liu
Recent works found that encodings based on samples of the 3D viewing rays can significantly improve the quality of multi-camera 3D object detection.
1 code implementation • 10 Nov 2022 • Haofei Xu, Jing Zhang, Jianfei Cai, Hamid Rezatofighi, Fisher Yu, DaCheng Tao, Andreas Geiger
We present a unified formulation and model for three motion and 3D perception tasks: optical flow, rectified stereo matching and unrectified stereo depth estimation from posed images.
Ranked #1 on
Optical Flow Estimation
on Sintel-clean
no code implementations • 8 Nov 2022 • Qi Fan, Mattia Segu, Yu-Wing Tai, Fisher Yu, Chi-Keung Tang, Bernt Schiele, Dengxin Dai
Thus, we propose to perturb the channel statistics of source domain features to synthesize various latent styles, so that the trained deep model can perceive diverse potential domains and generalizes well even without observations of target domain data in training.
no code implementations • 26 Oct 2022 • Jiawei Fu, Yunlong Song, Yan Wu, Fisher Yu, Davide Scaramuzza
The resulting policy directly infers control commands with feature representations learned from raw images, forgoing the need for globally-consistent state estimation, trajectory planning, and handcrafted control design.
no code implementations • 13 Oct 2022 • Menelaos Kanakis, Thomas E. Huang, David Bruggemann, Fisher Yu, Luc van Gool
In this paper, we find that jointly training a dense prediction (target) task with a self-supervised (auxiliary) task can consistently improve the performance of the target task, while eliminating the need for labeling auxiliary tasks.
Ranked #64 on
Semantic Segmentation
on NYU Depth v2
2 code implementations • 12 Oct 2022 • Tobias Fischer, Jiangmiao Pang, Thomas E. Huang, Linlu Qiu, Haofeng Chen, Trevor Darrell, Fisher Yu
In this paper, we present Quasi-Dense Similarity Learning, which densely samples hundreds of object regions on a pair of images for contrastive learning.
Ranked #4 on
Multiple Object Tracking
on BDD100K val
no code implementations • 10 Oct 2022 • Yihang She, Goutam Bhat, Martin Danelljan, Fisher Yu
These approaches however suffer from ``catastrophic forgetting'' issue due to finetuning of base detector, leading to sub-optimal performance on the base classes.
no code implementations • 17 Sep 2022 • Soomin Lee, Le Chen, Jiahao Wang, Alexander Liniger, Suryansh Kumar, Fisher Yu
In this paper, we tackle the problem of active robotic 3D reconstruction of an object.
1 code implementation • 6 Sep 2022 • Gurkirt Singh, Vasileios Choutas, Suman Saha, Fisher Yu, Luc van Gool
Current methods for spatiotemporal action tube detection often extend a bounding box proposal at a given keyframe into a 3D temporal cuboid and pool features from nearby frames.
1 code implementation • 28 Jul 2022 • Lei Ke, Henghui Ding, Martin Danelljan, Yu-Wing Tai, Chi-Keung Tang, Fisher Yu
While Video Instance Segmentation (VIS) has seen rapid progress, current approaches struggle to predict high-quality masks with accurate boundary details.
Ranked #1 on
Video Instance Segmentation
on HQ-YTVIS
1 code implementation • 26 Jul 2022 • Siyuan Li, Martin Danelljan, Henghui Ding, Thomas E. Huang, Fisher Yu
Our experiments show that TETA evaluates trackers more comprehensively, and TETer achieves significant improvements on the challenging large-scale datasets BDD100K and TAO compared to the state-of-the-art.
Ranked #6 on
Multiple Object Tracking
on BDD100K val
no code implementations • CVPR 2022 • Tao Sun, Mattia Segu, Janis Postels, Yuxuan Wang, Luc van Gool, Bernt Schiele, Federico Tombari, Fisher Yu
Adapting to a continuously evolving environment is a safety-critical challenge inevitably faced by all autonomous driving systems.
1 code implementation • 7 Apr 2022 • Erik Sandström, Martin R. Oswald, Suryansh Kumar, Silvan Weder, Fisher Yu, Cristian Sminchisescu, Luc van Gool
Multi-sensor depth fusion is able to substantially improve the robustness and accuracy of 3D reconstruction methods, but existing techniques are not robust enough to handle sensors which operate with diverse value ranges as well as noise and outlier statistics.
1 code implementation • CVPR 2022 • Martin Hahner, Christos Sakaridis, Mario Bijelic, Felix Heide, Fisher Yu, Dengxin Dai, Luc van Gool
Due to the difficulty of collecting and annotating training data in this setting, we propose a physically based method to simulate the effect of snowfall on real clear-weather LiDAR point clouds.
Ranked #1 on
3D Object Detection
on Heavy Snowfall
1 code implementation • CVPR 2022 • Christoph Mayer, Martin Danelljan, Goutam Bhat, Matthieu Paul, Danda Pani Paudel, Fisher Yu, Luc van Gool
Optimization based tracking methods have been widely successful by integrating a target model prediction module, providing effective global reasoning by minimizing an objective function.
Ranked #1 on
Visual Object Tracking
on LaSOT
(IS metric)
no code implementations • CVPR 2022 • Muhammad Zaigham Zaheer, Arif Mahmood, Muhammad Haris Khan, Mattia Segu, Fisher Yu, Seung-Ik Lee
Video anomaly detection is well investigated in weakly-supervised and one-class classification (OCC) settings.
1 code implementation • CVPR 2022 • Prune Truong, Martin Danelljan, Fisher Yu, Luc van Gool
We propose Probabilistic Warp Consistency, a weakly-supervised learning objective for semantic matching.
2 code implementations • CVPR 2022 • Andreas Lugmayr, Martin Danelljan, Andres Romero, Fisher Yu, Radu Timofte, Luc van Gool
In this work, we propose RePaint: A Denoising Diffusion Probabilistic Model (DDPM) based inpainting approach that is applicable to even extreme masks.
no code implementations • 19 Dec 2021 • Yan Wu, Jiahao Wang, Yan Zhang, Siwei Zhang, Otmar Hilliges, Fisher Yu, Siyu Tang
Given an initial pose and the generated whole-body grasping pose as the start and end of the motion respectively, we design a novel contact-aware generative motion infilling module to generate a diverse set of grasp-oriented motions.
1 code implementation • CVPR 2022 • Lei Ke, Martin Danelljan, Xia Li, Yu-Wing Tai, Chi-Keung Tang, Fisher Yu
Instead of operating on regular dense tensors, our Mask Transfiner decomposes and represents the image regions as a quadtree.
Ranked #1 on
Instance Segmentation
on BDD100K val
no code implementations • 5 Nov 2021 • Andreas Lugmayr, Martin Danelljan, Fisher Yu, Luc van Gool, Radu Timofte
Super-resolution is an ill-posed problem, where a ground-truth high-resolution image represents only one possibility in the space of plausible solutions.
no code implementations • 1 Nov 2021 • Yung-Hsu Yang, Thomas E. Huang, Min Sun, Samuel Rota Bulò, Peter Kontschieder, Fisher Yu
Our experiments show consistent and significant improvements on challenging semantic segmentation benchmarks, including Cityscapes, BDD100K, and Mapillary Vistas, at negligible computational and parameter overhead.
1 code implementation • 10 Sep 2021 • Rui Gong, Martin Danelljan, Dengxin Dai, Danda Pani Paudel, Ajad Chhatkuli, Fisher Yu, Luc van Gool
In many real-world settings, the target domain task requires a different taxonomy than the one imposed by the source domain.
2 code implementations • ICCV 2021 • Zhejun Zhang, Alexander Liniger, Dengxin Dai, Fisher Yu, Luc van Gool
Our end-to-end agent achieves a 78% success rate while generalizing to a new town and new weather on the NoCrash-dense benchmark and state-of-the-art performance on the challenging public routes of the CARLA LeaderBoard.
2 code implementations • ICCV 2021 • Goutam Bhat, Martin Danelljan, Fisher Yu, Luc van Gool, Radu Timofte
The deep reparametrization allows us to directly model the image formation process in the latent space, and to integrate learned image priors into the prediction.
Ranked #4 on
Burst Image Super-Resolution
on BurstSR
1 code implementation • 1 Jul 2021 • Janis Postels, Mattia Segu, Tao Sun, Luca Sieber, Luc van Gool, Fisher Yu, Federico Tombari
We find that, while DUMs scale to realistic vision tasks and perform well on OOD detection, the practicality of current methods is undermined by poor calibration under distributional shifts.
1 code implementation • NeurIPS 2021 • Lei Ke, Xia Li, Martin Danelljan, Yu-Wing Tai, Chi-Keung Tang, Fisher Yu
We propose Prototypical Cross-Attention Network (PCAN), capable of leveraging rich spatio-temporal information for online multiple object tracking and segmentation.
Ranked #1 on
Video Instance Segmentation
on BDD100K val
1 code implementation • ICCV 2021 • Xin Wang, Thomas E. Huang, Benlin Liu, Fisher Yu, Xiaolong Wang, Joseph E. Gonzalez, Trevor Darrell
Building reliable object detectors that are robust to domain shifts, such as various changes in context, viewpoint, and object appearances, is critical for real-world applications.
1 code implementation • ICCV 2021 • Prune Truong, Martin Danelljan, Fisher Yu, Luc van Gool
From our observations and empirical results, we design a general unsupervised objective employing two of the derived constraints.
1 code implementation • 12 Mar 2021 • Hou-Ning Hu, Yung-Hsu Yang, Tobias Fischer, Trevor Darrell, Fisher Yu, Min Sun
Experiments on our proposed simulation data and real-world benchmarks, including KITTI, nuScenes, and Waymo datasets, show that our tracking framework offers robust object association and tracking on urban-driving scenarios.
Ranked #6 on
Multiple Object Tracking
on KITTI Tracking test
5 code implementations • ICCV 2021 • Wenguan Wang, Tianfei Zhou, Fisher Yu, Jifeng Dai, Ender Konukoglu, Luc van Gool
Inspired by the recent advance in unsupervised contrastive representation learning, we propose a pixel-wise contrastive framework for semantic segmentation in the fully supervised setting.
1 code implementation • 14 Jan 2021 • Jinkun Cao, Xin Wang, Trevor Darrell, Fisher Yu
To decide the action at each step, we seek the action sequence that can lead to safe future states based on the prediction module outputs by repeatedly sampling likely action sequences.
2 code implementations • CVPR 2021 • Jiangmiao Pang, Linlu Qiu, Xia Li, Haofeng Chen, Qi Li, Trevor Darrell, Fisher Yu
Compared to methods with similar detectors, it boosts almost 10 points of MOTA and significantly decreases the number of ID switches on BDD100K and Waymo datasets.
Ranked #1 on
One-Shot Object Detection
on PASCAL VOC 2012 val
5 code implementations • ICML 2020 • Xin Wang, Thomas E. Huang, Trevor Darrell, Joseph E. Gonzalez, Fisher Yu
Such a simple approach outperforms the meta-learning methods by roughly 2~20 points on current benchmarks and sometimes even doubles the accuracy of the prior methods.
Ranked #14 on
Few-Shot Object Detection
on MS-COCO (30-shot)
1 code implementation • 11 Jun 2019 • Xin Wang, Fisher Yu, Trevor Darrell, Joseph E. Gonzalez
In this work, we propose a task-aware feature generation (TFG) framework for compositional learning, which generates features of novel visual concepts by transferring knowledge from previously seen concepts.
1 code implementation • CVPR 2019 • Xin Wang, Fisher Yu, Ruth Wang, Trevor Darrell, Joseph E. Gonzalez
We show that TAFE-Net is highly effective in generalizing to new tasks or concepts and evaluate the TAFE-Net on a range of benchmarks in zero-shot and few-shot learning.
2 code implementations • CVPR 2019 • Zhichao Yin, Trevor Darrell, Fisher Yu
Explicit representations of the global match distributions of pixel-wise correspondences between pairs of images are desirable for uncertainty estimation and downstream applications.
Ranked #10 on
Optical Flow Estimation
on KITTI 2015 (train)
4 code implementations • ICCV 2019 • Bingyi Kang, Zhuang Liu, Xin Wang, Fisher Yu, Jiashi Feng, Trevor Darrell
The feature learner extracts meta features that are generalizable to detect novel object classes, using training data from base classes with sufficient samples.
Ranked #18 on
Few-Shot Object Detection
on MS-COCO (30-shot)
1 code implementation • ICCV 2019 • Hang Gao, Huazhe Xu, Qi-Zhi Cai, Ruth Wang, Fisher Yu, Trevor Darrell
A dynamic scene has two types of elements: those that move fluidly and can be predicted from previous frames, and those which are disoccluded (exposed) and cannot be extrapolated.
1 code implementation • ICCV 2019 • Hou-Ning Hu, Qi-Zhi Cai, Dequan Wang, Ji Lin, Min Sun, Philipp Krähenbühl, Trevor Darrell, Fisher Yu
The framework can not only associate detections of vehicles in motion over time, but also estimate their complete 3D bounding box information from a sequence of 2D images captured on a moving platform.
Ranked #11 on
Multiple Object Tracking
on KITTI Tracking test
no code implementations • 13 Nov 2018 • Dequan Wang, Coline Devin, Qi-Zhi Cai, Fisher Yu, Trevor Darrell
While learning visuomotor skills in an end-to-end manner is appealing, deep neural networks are often uninterpretable and fail in surprising ways.
no code implementations • ECCV 2018 • Chaowei Xiao, Ruizhi Deng, Bo Li, Fisher Yu, Mingyan Liu, Dawn Song
In this paper, we aim to characterize adversarial examples based on spatial context information in semantic segmentation.
no code implementations • 5 Jun 2018 • Xin Wang, Fisher Yu, Lisa Dunlap, Yi-An Ma, Ruth Wang, Azalia Mirhoseini, Trevor Darrell, Joseph E. Gonzalez
Larger networks generally have greater representational power at the cost of increased computational complexity.
no code implementations • CVPR 2018 • Huiwen Chang, Jingwan Lu, Fisher Yu, Adam Finkelstein
This paper introduces an automatic method for editing a portrait photo so that the subject appears to be wearing makeup in the style of another person in a reference photo.
3 code implementations • CVPR 2020 • Fisher Yu, Haofeng Chen, Xin Wang, Wenqi Xian, Yingying Chen, Fangchen Liu, Vashisht Madhavan, Trevor Darrell
Datasets drive vision progress, yet existing driving datasets are impoverished in terms of visual content and supported tasks to study multitask learning for autonomous driving.
no code implementations • ICLR 2018 • Yang Gao, Huazhe Xu, Ji Lin, Fisher Yu, Sergey Levine, Trevor Darrell
We propose a unified reinforcement learning algorithm, Normalized Actor-Critic (NAC), that effectively normalizes the Q-function, reducing the Q-values of actions unseen in the demonstration data.
2 code implementations • ECCV 2018 • Xin Wang, Fisher Yu, Zi-Yi Dou, Trevor Darrell, Joseph E. Gonzalez
While deeper convolutional networks are needed to achieve maximum accuracy in visual perception tasks, for many inputs shallower networks are sufficient.
6 code implementations • CVPR 2018 • Fisher Yu, Dequan Wang, Evan Shelhamer, Trevor Darrell
We augment standard architectures with deeper aggregation to better fuse information across layers.
no code implementations • 16 Jun 2017 • Jerry Liu, Fisher Yu, Thomas Funkhouser
This paper proposes the idea of using a generative adversarial network (GAN) to assist a novice user in designing real-world shapes with a simple interface.
2 code implementations • CVPR 2018 • Wenqi Xian, Patsorn Sangkloy, Varun Agrawal, Amit Raj, Jingwan Lu, Chen Fang, Fisher Yu, James Hays
In this paper, we investigate deep image synthesis guided by sketch, color, and texture.
Ranked #2 on
Image Reconstruction
on Edge-to-Shoes
no code implementations • 3 Jun 2017 • Xin Wang, Yujia Luo, Daniel Crankshaw, Alexey Tumanov, Fisher Yu, Joseph E. Gonzalez
Advances in deep learning have led to substantial increases in prediction accuracy but have been accompanied by increases in the cost of rendering predictions.
3 code implementations • CVPR 2017 • Fisher Yu, Vladlen Koltun, Thomas Funkhouser
Convolutional networks for image classification progressively reduce resolution until the image is represented by tiny feature maps in which the spatial structure of the scene is no longer discernible.
4 code implementations • 8 Dec 2016 • Judy Hoffman, Dequan Wang, Fisher Yu, Trevor Darrell
In this paper, we introduce the first domain adaptive semantic segmentation method, proposing an unsupervised adversarial approach to pixel prediction problems.
Ranked #2 on
Image-to-Image Translation
on SYNTHIA Fall-to-Winter
2 code implementations • CVPR 2017 • Huazhe Xu, Yang Gao, Fisher Yu, Trevor Darrell
Robust perception-action models should be learned from training data with diverse visual appearances and realistic behaviors, yet current approaches to deep visuomotor policy learning have been generally limited to in-situ models learned from a single vehicle or a simulation environment.
1 code implementation • CVPR 2017 • Patsorn Sangkloy, Jingwan Lu, Chen Fang, Fisher Yu, James Hays
In this paper, we propose a deep adversarial image synthesis architecture that is conditioned on sketched boundaries and sparse color strokes to generate realistic cars, bedrooms, or faces.
3 code implementations • CVPR 2017 • Shuran Song, Fisher Yu, Andy Zeng, Angel X. Chang, Manolis Savva, Thomas Funkhouser
This paper focuses on semantic scene completion, a task for producing a complete 3D voxel representation of volumetric occupancy and semantic labels for a scene from a single-view depth map observation.
Ranked #10 on
3D Semantic Scene Completion
on SemanticKITTI
13 code implementations • 9 Dec 2015 • Angel X. Chang, Thomas Funkhouser, Leonidas Guibas, Pat Hanrahan, Qi-Xing Huang, Zimo Li, Silvio Savarese, Manolis Savva, Shuran Song, Hao Su, Jianxiong Xiao, Li Yi, Fisher Yu
We present ShapeNet: a richly-annotated, large-scale repository of shapes represented by 3D CAD models of objects.
8 code implementations • 23 Nov 2015 • Fisher Yu, Vladlen Koltun
State-of-the-art models for semantic segmentation are based on adaptations of convolutional networks that had originally been designed for image classification.
Ranked #10 on
Semantic Segmentation
on CamVid
4 code implementations • 10 Jun 2015 • Fisher Yu, Ari Seff, yinda zhang, Shuran Song, Thomas Funkhouser, Jianxiong Xiao
While there has been remarkable progress in the performance of visual recognition algorithms, the state-of-the-art models tend to be exceptionally data-hungry.
no code implementations • CVPR 2015 • Fisher Yu, Jianxiong Xiao, Thomas Funkhouser
This paper describes an automatic algorithm for global alignment of LiDAR data collected with Google Street View cars in urban environments.
no code implementations • CVPR 2015 • Zhirong Wu, Shuran Song, Aditya Khosla, Fisher Yu, Linguang Zhang, Xiaoou Tang, Jianxiong Xiao
Our model, 3D ShapeNets, learns the distribution of complex 3D shapes across different object categories and arbitrary poses from raw CAD data, and discovers hierarchical compositional part representations automatically.
Ranked #30 on
3D Point Cloud Classification
on ModelNet40
(Mean Accuracy metric)
no code implementations • CVPR 2014 • Fisher Yu, David Gallup
We have discovered that 3D reconstruction can be achieved from asingle still photographic capture due to accidental motions of thephotographer, even while attempting to hold the camera still.