Search Results for author: Luc van Gool

Found 552 papers, 282 papers with code

Fixing Localization Errors to Improve Image Classification

1 code implementation ECCV 2020 Guolei Sun, Salman Khan, Wen Li, Hisham Cholakkal, Fahad Shahbaz Khan, Luc van Gool

This way, in an effort to fix localization errors, our loss provides an extra supervisory signal that helps the model to better discriminate between similar classes.

Classification General Classification +3

Modeling the Effects of Windshield Refraction for Camera Calibration

no code implementations ECCV 2020 Frank Verbiest, Marc Proesmans, Luc van Gool

Instead of using a generalized camera approach, we propose a novel approach to jointly optimize a traditional camera model, and a mathematical representation of the windshield’s surface.

Autonomous Driving Camera Calibration

Any Image Restoration with Efficient Automatic Degradation Adaptation

no code implementations18 Jul 2024 Bin Ren, Eduard Zamfir, Yawei Li, Zongwei Wu, Danda Pani Paudel, Radu Timofte, Nicu Sebe, Luc van Gool

With the emergence of mobile devices, there is a growing demand for an efficient model to restore any degraded image for better perceptual quality.

Bringing Masked Autoencoders Explicit Contrastive Properties for Point Cloud Self-Supervised Learning

no code implementations8 Jul 2024 Bin Ren, Guofeng Mei, Danda Pani Paudel, Weijie Wang, Yawei Li, Mengyuan Liu, Rita Cucchiara, Luc van Gool, Nicu Sebe

To answer this question, we first empirically validate that integrating MAE-based point cloud pre-training with the standard contrastive learning paradigm, even with meticulous design, can lead to a decrease in performance.

Contrastive Learning Data Augmentation +2

Stereo Risk: A Continuous Modeling Approach to Stereo Matching

no code implementations3 Jul 2024 Ce Liu, Suryansh Kumar, Shuhang Gu, Radu Timofte, Yao Yao, Luc van Gool

Stereo Risk departs from the conventional discretization approach by formulating the scene disparity as an optimal solution to a continuous risk minimization problem, hence the name "stereo risk".

Disparity Estimation Stereo Matching

TrafficBots V1.5: Traffic Simulation via Conditional VAEs and Transformers with Relative Pose Encoding

1 code implementation16 Jun 2024 Zhejun Zhang, Christos Sakaridis, Luc van Gool

In this technical report we present TrafficBots V1. 5, a baseline method for the closed-loop simulation of traffic agents.

Matching Anything by Segmenting Anything

1 code implementation CVPR 2024 Siyuan Li, Lei Ke, Martin Danelljan, Luigi Piccinelli, Mattia Segu, Luc van Gool, Fisher Yu

The robust association of the same objects across video frames in complex scenes is crucial for many applications, especially Multiple Object Tracking (MOT).

Domain Generalization Multiple Object Tracking +2

Sharing Key Semantics in Transformer Makes Efficient Image Restoration

no code implementations30 May 2024 Bin Ren, Yawei Li, Jingyun Liang, Rakesh Ranjan, Mengyuan Liu, Rita Cucchiara, Luc van Gool, Ming-Hsuan Yang, Nicu Sebe

Additionally, for IR, it is commonly noted that small segments of a degraded image, particularly those closely aligned semantically, provide particularly relevant information to aid in the restoration process, as they contribute essential contextual cues crucial for accurate reconstruction.

Image Restoration

Towards a Generalist and Blind RGB-X Tracker

1 code implementation28 May 2024 Yuedong Tan, Zongwei Wu, Yuqian Fu, Zhuyun Zhou, Guolei Sun, Chao Ma, Danda Pani Paudel, Luc van Gool, Radu Timofte

With the emergence of a single large model capable of successfully solving a multitude of tasks in NLP, there has been growing research interest in achieving similar goals in computer vision.

Inductive Bias Multi-Label Classification +1

Splat-SLAM: Globally Optimized RGB-only SLAM with 3D Gaussians

1 code implementation26 May 2024 Erik Sandström, Keisuke Tateno, Michael Oechsle, Michael Niemeyer, Luc van Gool, Martin R. Oswald, Federico Tombari

In response, we propose the first RGB-only SLAM system with a dense 3D Gaussian map representation that utilizes all benefits of globally optimized tracking by adapting dynamically to keyframe pose and depth updates by actively deforming the 3D Gaussian map.

3D Reconstruction Simultaneous Localization and Mapping

Radar Fields: Frequency-Space Neural Scene Representations for FMCW Radar

no code implementations7 May 2024 David Borts, Erich Liang, Tim Brödermann, Andrea Ramazzina, Stefanie Walz, Edoardo Palladin, Jipeng Sun, David Bruggemann, Christos Sakaridis, Luc van Gool, Mario Bijelic, Felix Heide

Neural fields have been broadly investigated as scene representations for the reproduction and novel generation of diverse outdoor scenes, including those autonomous vehicles and robots must handle.

Autonomous Vehicles

Investigating the Effectiveness of Cross-Attention to Unlock Zero-Shot Editing of Text-to-Video Diffusion Models

no code implementations8 Apr 2024 Saman Motamed, Wouter Van Gansbeke, Luc van Gool

With recent advances in image and video diffusion models for content creation, a plethora of techniques have been proposed for customizing their generated content.

Video Editing

Self-Explainable Affordance Learning with Embodied Caption

no code implementations8 Apr 2024 Zhipeng Zhang, Zhimin Wei, Guolei Sun, Peng Wang, Luc van Gool

In the field of visual affordance learning, previous methods mainly used abundant images or videos that delineate human behavior patterns to identify action possibility regions for object manipulation, with a variety of applications in robotic tasks.

Empowering Image Recovery_ A Multi-Attention Approach

no code implementations6 Apr 2024 Juan Wen, Yawei Li, Chao Zhang, Weiyan Hou, Radu Timofte, Luc van Gool

Integration of attention mechanisms across feature and positional dimensions further enhances the recovery of fine details.

Image Restoration

Language-Guided Instance-Aware Domain-Adaptive Panoptic Segmentation

no code implementations4 Apr 2024 Elham Amin Mansour, Ozan Unal, Suman Saha, Benjamin Bejar, Luc van Gool

A key challenge in panoptic UDA is reducing the domain gap between a labeled source and an unlabeled target domain while harmonizing the subtasks of semantic and instance segmentation to limit catastrophic interference.

Autonomous Driving Instance Segmentation +3

HandDiff: 3D Hand Pose Estimation with Diffusion on Image-Point Cloud

1 code implementation CVPR 2024 Wencan Cheng, Hao Tang, Luc van Gool, Jong Hwan Ko

Extracting keypoint locations from input hand frames, known as 3D hand pose estimation, is a critical task in various human-computer interaction applications.

3D Hand Pose Estimation

I-Design: Personalized LLM Interior Designer

no code implementations3 Apr 2024 Ata Çelen, Guo Han, Konrad Schindler, Luc van Gool, Iro Armeni, Anton Obukhov, Xi Wang

Interior design allows us to be who we are and live how we want - each design is as unique as our distinct personality.

Language Modelling Large Language Model +2

A Unified and Interpretable Emotion Representation and Expression Generation

no code implementations CVPR 2024 Reni Paskaleva, Mykyta Holubakha, Andela Ilic, Saman Motamed, Luc van Gool, Danda Paudel

However, emotions are often compound, e. g. happily surprised, and can be mapped to the action units (AUs) used for expressing emotions, and trivially to the canonical ones.

GlORIE-SLAM: Globally Optimized RGB-only Implicit Encoding Point Cloud SLAM

1 code implementation28 Mar 2024 Ganlin Zhang, Erik Sandström, Youmin Zhang, Manthan Patel, Luc van Gool, Martin R. Oswald

To alleviate this issue, with the aid of a monocular depth estimator, we introduce a novel DSPO layer for bundle adjustment which optimizes the pose and depth of keyframes along with the scale of the monocular depth.

Simultaneous Localization and Mapping

Towards Online Real-Time Memory-based Video Inpainting Transformers

no code implementations24 Mar 2024 Guillaume Thiry, Hao Tang, Radu Timofte, Luc van Gool

Video inpainting tasks have seen significant improvements in recent years with the rise of deep neural networks and, in particular, vision transformers.

Video Inpainting

FocusCLIP: Multimodal Subject-Level Guidance for Zero-Shot Transfer in Human-Centric Tasks

no code implementations11 Mar 2024 Muhammad Saif Ullah Khan, Muhammad Ferjad Naeem, Federico Tombari, Luc van Gool, Didier Stricker, Muhammad Zeshan Afzal

We propose FocusCLIP, integrating subject-level guidance--a specialized mechanism for target-specific supervision--into the CLIP framework for improved zero-shot transfer on human-centric tasks.

Activity Recognition Age Classification +1

Rethinking Few-shot 3D Point Cloud Semantic Segmentation

1 code implementation CVPR 2024 Zhaochong An, Guolei Sun, Yun Liu, Fayao Liu, Zongwei Wu, Dan Wang, Luc van Gool, Serge Belongie

The former arises from non-uniform point sampling, allowing models to distinguish the density disparities between foreground and background for easier segmentation.

Few-shot 3D Point Cloud Semantic Segmentation Segmentation +1

Loopy-SLAM: Dense Neural SLAM with Loop Closures

no code implementations CVPR 2024 Lorenzo Liso, Erik Sandström, Vladimir Yugay, Luc van Gool, Martin R. Oswald

Neural RGBD SLAM techniques have shown promise in dense Simultaneous Localization And Mapping (SLAM), yet face challenges such as error accumulation during camera tracking resulting in distorted maps.

Simultaneous Localization and Mapping

Cross-Domain Few-Shot Object Detection via Enhanced Open-Set Object Detector

1 code implementation5 Feb 2024 Yuqian Fu, Yu Wang, Yixuan Pan, Lian Huai, Xingyu Qiu, Zeyu Shangguan, Tong Liu, Yanwei Fu, Luc van Gool, Xingqun Jiang

This paper studies the challenging cross-domain few-shot object detection (CD-FSOD), aiming to develop an accurate object detector for novel domains with minimal labeled examples.

Cross-Domain Few-Shot Cross-Domain Few-Shot Object Detection +3

Key-Graph Transformer for Image Restoration

no code implementations4 Feb 2024 Bin Ren, Yawei Li, Jingyun Liang, Rakesh Ranjan, Mengyuan Liu, Rita Cucchiara, Luc van Gool, Nicu Sebe

While it is crucial to capture global information for effective image restoration (IR), integrating such cues into transformer-based methods becomes computationally expensive, especially with high input resolution.

Graph Attention Image Restoration

Image Fusion via Vision-Language Model

1 code implementation3 Feb 2024 Zixiang Zhao, Lilun Deng, Haowen Bai, Yukun Cui, Zhipeng Zhang, Yulun Zhang, Haotong Qin, Dongdong Chen, Jiangshe Zhang, Peng Wang, Luc van Gool

Therefore, we introduce a novel fusion paradigm named image Fusion via vIsion-Language Model (FILM), for the first time, utilizing explicit textual information from source images to guide the fusion process.

Decoder Language Modelling

Vanishing-Point-Guided Video Semantic Segmentation of Driving Scenes

1 code implementation CVPR 2024 Diandian Guo, Deng-Ping Fan, Tongyu Lu, Christos Sakaridis, Luc van Gool

The estimation of implicit cross-frame correspondences and the high computational cost have long been major challenges in video semantic segmentation (VSS) for driving scenes.

Motion Estimation Segmentation +2

MUSES: The Multi-Sensor Semantic Perception Dataset for Driving under Uncertainty

1 code implementation23 Jan 2024 Tim Brödermann, David Bruggemann, Christos Sakaridis, Kevin Ta, Odysseas Liagouris, Jason Corkill, Luc van Gool

Achieving level-5 driving automation in autonomous vehicles necessitates a robust semantic visual perception system capable of parsing data from different sensors across diverse conditions.

Autonomous Vehicles Panoptic Segmentation

Learning to Prompt with Text Only Supervision for Vision-Language Models

1 code implementation4 Jan 2024 Muhammad Uzair Khattak, Muhammad Ferjad Naeem, Muzammal Naseer, Luc van Gool, Federico Tombari

While effective, most of these works require labeled data which is not practical, and often struggle to generalize towards new datasets due to over-fitting on the source data.

Prompt Engineering

Real-World Mobile Image Denoising Dataset with Efficient Baselines

1 code implementation CVPR 2024 Roman Flepp, Andrey Ignatov, Radu Timofte, Luc van Gool

Despite the latest advancements in camera hardware the mobile camera sensor area cannot be increased significantly due to physical constraints leading to a pixel size of 0. 6--2. 0 \mum which results in strong image noise even in moderate lighting conditions.

Image Denoising

Residual Learning for Image Point Descriptors

no code implementations24 Dec 2023 Rashik Shrestha, Ajad Chhatkuli, Menelaos Kanakis, Luc van Gool

Such an approach of optimization allows us to discard learning knowledge already present in non-differentiable functions such as the hand-crafted descriptors and only learn the residual knowledge in the main network branch.

Camera Localization Ensemble Learning

Ternary-type Opacity and Hybrid Odometry for RGB-only NeRF-SLAM

no code implementations20 Dec 2023 Junru Lin, Asen Nachkov, Songyou Peng, Luc van Gool, Danda Pani Paudel

To foster this line of research, we also propose a simple yet novel visual odometry scheme that uses a hybrid combination of volumetric and warping-based image renderings.

Visual Odometry

Zero-Shot Point Cloud Registration

no code implementations5 Dec 2023 Weijie Wang, Guofeng Mei, Bin Ren, Xiaoshui Huang, Fabio Poiesi, Luc van Gool, Nicu Sebe, Bruno Lepri

The cornerstone of ZeroReg is the novel transfer of image features from keypoints to the point cloud, enriched by aggregating information from 3D geometric neighborhoods.

Decoder Point Cloud Registration

DGInStyle: Domain-Generalizable Semantic Segmentation with Image Diffusion Models and Stylized Semantic Control

no code implementations5 Dec 2023 Yuru Jia, Lukas Hoyer, Shengyu Huang, Tianfu Wang, Luc van Gool, Konrad Schindler, Anton Obukhov

Large, pretrained latent diffusion models (LDMs) have demonstrated an extraordinary ability to generate creative content, specialize to user data through few-shot fine-tuning, and condition their output on other modalities, such as semantic maps.

Autonomous Driving Domain Generalization +1

PALM: Predicting Actions through Language Models

no code implementations29 Nov 2023 Sanghwan Kim, Daoji Huang, Yongqin Xian, Otmar Hilliges, Luc van Gool, Xi Wang

Traditional methods heavily rely on representation learning that is trained on a large amount of video data.

Action Anticipation Action Recognition +4

SemiVL: Semi-Supervised Semantic Segmentation with Vision-Language Guidance

1 code implementation27 Nov 2023 Lukas Hoyer, David Joseph Tan, Muhammad Ferjad Naeem, Luc van Gool, Federico Tombari

In SemiVL, we propose to integrate rich priors from VLM pre-training into semi-supervised semantic segmentation to learn better semantic decision boundaries.

Decoder Segmentation +1

2D Feature Distillation for Weakly- and Semi-Supervised 3D Semantic Segmentation

no code implementations27 Nov 2023 Ozan Unal, Dengxin Dai, Lukas Hoyer, Yigit Baran Can, Luc van Gool

As 3D perception problems grow in popularity and the need for large-scale labeled datasets for LiDAR semantic segmentation increase, new methods arise that aim to reduce the necessity for dense annotations by employing weakly-supervised training.

2D Semantic Segmentation 3D Semantic Segmentation +3

Lego: Learning to Disentangle and Invert Concepts Beyond Object Appearance in Text-to-Image Diffusion Models

no code implementations23 Nov 2023 Saman Motamed, Danda Pani Paudel, Luc van Gool

To enable customized content creation based on a few example images of a concept, methods such as Textual Inversion and DreamBooth invert the desired concept and enable synthesizing it in new scenes.

Language Modelling Large Language Model +3

3D Compression Using Neural Fields

no code implementations21 Nov 2023 Janis Postels, Yannick Strümpler, Klara Reichard, Luc van Gool, Federico Tombari

Neural Fields (NFs) have gained momentum as a tool for compressing various data modalities - e. g. images and videos.

Attribute

Deep Equilibrium Diffusion Restoration with Parallel Sampling

1 code implementation CVPR 2024 JieZhang Cao, Yue Shi, Kai Zhang, Yulun Zhang, Radu Timofte, Luc van Gool

Due to the inherent property of diffusion models, most existing methods need long serial sampling chains to restore HQ images step-by-step, resulting in expensive sampling time and high computation costs.

Image Restoration

Model-aware 3D Eye Gaze from Weak and Few-shot Supervisions

1 code implementation20 Nov 2023 Nikola Popovic, Dimitrios Christodoulou, Danda Pani Paudel, Xi Wang, Luc van Gool

In this work, we propose to predict 3D eye gaze from weak supervision of eye semantic segmentation masks and direct supervision of a few 3D gaze vectors.

Semantic Segmentation

MoVideo: Motion-Aware Video Generation with Diffusion Models

no code implementations19 Nov 2023 Jingyun Liang, Yuchen Fan, Kai Zhang, Radu Timofte, Luc van Gool, Rakesh Ranjan

While recent years have witnessed great progress on using diffusion models for video generation, most of them are simple extensions of image generation frameworks, which fail to explicitly consider one of the key differences between videos and images, i. e., motion.

Image Generation Image to Video Generation +1

Contrastive Learning for Multi-Object Tracking with Transformers

no code implementations14 Nov 2023 Pierre-François De Plaen, Nicola Marinello, Marc Proesmans, Tinne Tuytelaars, Luc van Gool

The DEtection TRansformer (DETR) opened new possibilities for object detection by modeling it as a translation task: converting image features into object-level representations.

Contrastive Learning Multi-Object Tracking +4

Learning Robust Multi-Scale Representation for Neural Radiance Fields from Unposed Images

no code implementations8 Nov 2023 Nishant Jain, Suryansh Kumar, Luc van Gool

The key ideas presented in this paper are (i) Recovering accurate camera parameters via a robust pipeline from unposed day-to-day images is equally crucial in neural novel view synthesis problem; (ii) It is rather more practical to model object's content at different resolutions since dramatic camera motion is highly likely in day-to-day unposed images.

Camera Pose Estimation Depth Estimation +4

Towards High-quality HDR Deghosting with Conditional Diffusion Models

no code implementations2 Nov 2023 Qingsen Yan, Tao Hu, Yuan Sun, Hao Tang, Yu Zhu, Wei Dong, Luc van Gool, Yanning Zhang

To address this challenge, we formulate the HDR deghosting problem as an image generation that leverages LDR features as the diffusion model's condition, consisting of the feature condition generator and the noise predictor.

Denoising Image Generation

SILC: Improving Vision Language Pretraining with Self-Distillation

no code implementations20 Oct 2023 Muhammad Ferjad Naeem, Yongqin Xian, Xiaohua Zhai, Lukas Hoyer, Luc van Gool, Federico Tombari

However, the contrastive objective used by these models only focuses on image-text alignment and does not incentivise image feature learning for dense prediction tasks.

Classification Contrastive Learning +7

Real-Time Motion Prediction via Heterogeneous Polyline Transformer with Relative Pose Encoding

2 code implementations NeurIPS 2023 Zhejun Zhang, Alexander Liniger, Christos Sakaridis, Fisher Yu, Luc van Gool

The real-world deployment of an autonomous driving system requires its components to run on-board and in real-time, including the motion prediction module that predicts the future trajectories of surrounding traffic participants.

Autonomous Driving motion prediction

Discwise Active Learning for LiDAR Semantic Segmentation

no code implementations23 Sep 2023 Ozan Unal, Dengxin Dai, Ali Tamer Unal, Luc van Gool

Finally we propose a semi-supervised learning approach to utilize all frames within our dataset and improve performance.

Active Learning LIDAR Semantic Segmentation +1

Deformable Neural Radiance Fields using RGB and Event Cameras

no code implementations ICCV 2023 Qi Ma, Danda Pani Paudel, Ajad Chhatkuli, Luc van Gool

In this work, we develop a novel method to model the deformable neural radiance fields using RGB and event cameras.

Breathing New Life into 3D Assets with Generative Repainting

2 code implementations15 Sep 2023 Tianfu Wang, Menelaos Kanakis, Konrad Schindler, Luc van Gool, Anton Obukhov

Diffusion-based text-to-image models ignited immense attention from the vision community, artists, and content creators.

Temporal-aware Hierarchical Mask Classification for Video Semantic Segmentation

1 code implementation14 Sep 2023 Zhaochong An, Guolei Sun, Zongwei Wu, Hao Tang, Luc van Gool

Modern approaches have proved the huge potential of addressing semantic segmentation as a mask classification task which is widely used in instance-level segmentation.

Classification Decoder +3

Four Ways to Improve Verbo-visual Fusion for Dense 3D Visual Grounding

no code implementations8 Sep 2023 Ozan Unal, Christos Sakaridis, Suman Saha, Luc van Gool

3D visual grounding is the task of localizing the object in a 3D scene which is referred by a description in natural language.

3D Instance Segmentation 3D visual grounding +3

Neural Gradient Regularizer

1 code implementation31 Aug 2023 Shuang Xu, Yifan Wang, Zixiang Zhao, Jiangjun Peng, Xiangyong Cao, Deyu Meng, Yulun Zhang, Radu Timofte, Luc van Gool

NGR is applicable to various image types and different image processing tasks, functioning in a zero-shot learning fashion, making it a versatile and plug-and-play regularizer.

Zero-Shot Learning

Introducing Language Guidance in Prompt-based Continual Learning

1 code implementation ICCV 2023 Muhammad Gul Zain Ali Khan, Muhammad Ferjad Naeem, Luc van Gool, Didier Stricker, Federico Tombari, Muhammad Zeshan Afzal

While the model faces a disjoint set of classes in each task in this setting, we argue that these classes can be encoded to the same embedding space of a pre-trained language encoder.

Continual Learning

DiffI2I: Efficient Diffusion Model for Image-to-Image Translation

no code implementations26 Aug 2023 Bin Xia, Yulun Zhang, Shiyin Wang, Yitong Wang, Xinglong Wu, Yapeng Tian, Wenming Yang, Radu Timotfe, Luc van Gool

Compared to traditional DMs, the compact IPR enables DiffI2I to obtain more accurate outcomes and employ a lighter denoising network and fewer iterations.

Denoising Image-to-Image Translation +2

DREAMWALKER: Mental Planning for Continuous Vision-Language Navigation

no code implementations ICCV 2023 Hanqing Wang, Wei Liang, Luc van Gool, Wenguan Wang

VLN-CE is a recently released embodied task, where AI agents need to navigate a freely traversable environment to reach a distant target location, given language instructions.

Decision Making Navigate +1

When Super-Resolution Meets Camouflaged Object Detection: A Comparison Study

no code implementations8 Aug 2023 Juan Wen, Shupeng Cheng, Peng Xu, BoWen Zhou, Radu Timofte, Weiyan Hou, Luc van Gool

Super Resolution (SR) and Camouflaged Object Detection (COD) are two hot topics in computer vision with various joint applications.

Object object-detection +2

How Good is Google Bard's Visual Understanding? An Empirical Study on Open Challenges

1 code implementation27 Jul 2023 Haotong Qin, Ge-Peng Ji, Salman Khan, Deng-Ping Fan, Fahad Shahbaz Khan, Luc van Gool

Google's Bard has emerged as a formidable competitor to OpenAI's ChatGPT in the field of conversational AI.

Prior Based Online Lane Graph Extraction from Single Onboard Camera Image

no code implementations25 Jul 2023 Yigit Baran Can, Alexander Liniger, Danda Pani Paudel, Luc van Gool

Thus, online estimation of the lane graph is crucial for widespread and reliable autonomous navigation.

Autonomous Navigation

Edge Guided GANs with Multi-Scale Contrastive Learning for Semantic Image Synthesis

1 code implementation22 Jul 2023 Hao Tang, Guolei Sun, Nicu Sebe, Luc van Gool

To tackle 2), we design an effective module to selectively highlight class-dependent feature maps according to the original semantic layout to preserve the semantic information.

Contrastive Learning Image Generation

Improving Online Lane Graph Extraction by Object-Lane Clustering

no code implementations ICCV 2023 Yigit Baran Can, Alexander Liniger, Danda Pani Paudel, Luc van Gool

In this work, we propose an architecture and loss formulation to improve the accuracy of local lane graph estimates by using 3D object detection outputs.

3D Object Detection Autonomous Driving +4

AutoDecoding Latent 3D Diffusion Models

1 code implementation NeurIPS 2023 Evangelos Ntavelis, Aliaksandr Siarohin, Kyle Olszewski, Chaoyang Wang, Luc van Gool, Sergey Tulyakov

We present a novel approach to the generation of static and articulated 3D assets that has a 3D autodecoder at its core.

Prompting Diffusion Representations for Cross-Domain Semantic Segmentation

no code implementations5 Jul 2023 Rui Gong, Martin Danelljan, Han Sun, Julio Delgado Mangas, Luc van Gool

Intrigued by this result, we set out to explore how well diffusion-pretrained representations generalize to new domains, a crucial ability for any representation.

Domain Generalization Image Generation +2

Palm: Predicting Actions through Language Models @ Ego4D Long-Term Action Anticipation Challenge 2023

1 code implementation28 Jun 2023 Daoji Huang, Otmar Hilliges, Luc van Gool, Xi Wang

We present Palm, a solution to the Long-Term Action Anticipation (LTA) task utilizing vision-language and large language models.

Action Anticipation Image Captioning +3

UncLe-SLAM: Uncertainty Learning for Dense Neural SLAM

1 code implementation19 Jun 2023 Erik Sandström, Kevin Ta, Luc van Gool, Martin R. Oswald

We present an uncertainty learning framework for dense neural simultaneous localization and mapping (SLAM).

Simultaneous Localization and Mapping

SF-FSDA: Source-Free Few-Shot Domain Adaptive Object Detection with Efficient Labeled Data Factory

no code implementations7 Jun 2023 Han Sun, Rui Gong, Konrad Schindler, Luc van Gool

Domain adaptive object detection aims to leverage the knowledge learned from a labeled source domain to improve the performance on an unlabeled target domain.

Object object-detection +2

Condition-Invariant Semantic Segmentation

1 code implementation27 May 2023 Christos Sakaridis, David Bruggemann, Fisher Yu, Luc van Gool

Motivated by these findings, we propose to leverage stylization in performing feature-level adaptation by aligning the internal network features extracted by the encoder of the network from the original and the stylized view of each input image with a novel feature invariance loss.

Segmentation Semantic Segmentation +1

Equivariant Multi-Modality Image Fusion

4 code implementations CVPR 2024 Zixiang Zhao, Haowen Bai, Jiangshe Zhang, Yulun Zhang, Kai Zhang, Shuang Xu, Dongdong Chen, Radu Timofte, Luc van Gool

These components enable the net training to follow the principles of the natural sensing-imaging process while satisfying the equivariant imaging prior.

Self-Supervised Learning

Denoising Diffusion Models for Plug-and-Play Image Restoration

2 code implementations15 May 2023 Yuanzhi Zhu, Kai Zhang, Jingyun Liang, JieZhang Cao, Bihan Wen, Radu Timofte, Luc van Gool

Although diffusion models have shown impressive performance for high-quality image synthesis, their potential to serve as a generative denoiser prior to the plug-and-play IR methods remains to be further explored.

Deblurring Denoising +4

StyleGenes: Discrete and Efficient Latent Distributions for GANs

no code implementations30 Apr 2023 Evangelos Ntavelis, Mohamad Shahbazi, Iason Kastanis, Radu Timofte, Martin Danelljan, Luc van Gool

Thus, by independently sampling a variant for each gene and combining them into the final latent vector, our approach can represent a vast number of unique latent samples from a compact set of learnable parameters.

Disentanglement Diversity

EDAPS: Enhanced Domain-Adaptive Panoptic Segmentation

1 code implementation ICCV 2023 Suman Saha, Lukas Hoyer, Anton Obukhov, Dengxin Dai, Luc van Gool

EDAPS significantly improves the state-of-the-art performance for panoptic segmentation UDA by a large margin of 20% on SYNTHIA-to-Cityscapes and even 72% on the more challenging SYNTHIA-to-Mapillary Vistas.

Domain Adaptation Instance Segmentation +2

Neural Implicit Dense Semantic SLAM

no code implementations27 Apr 2023 Yasaman Haghighi, Suryansh Kumar, Jean-Philippe Thiran, Luc van Gool

Visual Simultaneous Localization and Mapping (vSLAM) is a widely used technique in robotics and computer vision that enables a robot to create a map of an unfamiliar environment using a camera sensor while simultaneously tracking its position over time.

Scene Understanding Semantic Segmentation +1

Domain Adaptive and Generalizable Network Architectures and Training Strategies for Semantic Image Segmentation

3 code implementations26 Apr 2023 Lukas Hoyer, Dengxin Dai, Luc van Gool

As previous UDA&DG semantic segmentation methods are mostly based on outdated networks, we benchmark more recent architectures, reveal the potential of Transformers, and design the DAFormer network tailored for UDA&DG.

Domain Generalization Image Segmentation +2

Indiscernible Object Counting in Underwater Scenes

1 code implementation CVPR 2023 Guolei Sun, Zhaochong An, Yun Liu, Ce Liu, Christos Sakaridis, Deng-Ping Fan, Luc van Gool

We further advance the frontier of this field by systematically studying a new challenge named indiscernible object counting (IOC), the goal of which is to count objects that are blended with respect to their surroundings.

Benchmarking Object +2

Advances in Deep Concealed Scene Understanding

1 code implementation21 Apr 2023 Deng-Ping Fan, Ge-Peng Ji, Peng Xu, Ming-Ming Cheng, Christos Sakaridis, Luc van Gool

Concealed scene understanding (CSU) is a hot computer vision topic aiming to perceive objects exhibiting camouflage.

Scene Understanding Semantic Segmentation

Quantum Annealing for Single Image Super-Resolution

no code implementations18 Apr 2023 Han Yao Choong, Suryansh Kumar, Luc van Gool

As a result, in this work, we take the privilege to perform an early exploration of applying a quantum computing algorithm to this important image enhancement problem, i. e., SISR.

Combinatorial Optimization Image Enhancement +1

SAM Struggles in Concealed Scenes -- Empirical Study on "Segment Anything"

no code implementations12 Apr 2023 Ge-Peng Ji, Deng-Ping Fan, Peng Xu, Ming-Ming Cheng, BoWen Zhou, Luc van Gool

Segmenting anything is a ground-breaking step toward artificial general intelligence, and the Segment Anything Model (SAM) greatly fosters the foundation models for computer vision.

CamDiff: Camouflage Image Augmentation via Diffusion Model

1 code implementation11 Apr 2023 Xue-Jing Luo, Shuo Wang, Zongwei Wu, Christos Sakaridis, Yun Cheng, Deng-Ping Fan, Luc van Gool

Specifically, we leverage the latent diffusion model to synthesize salient objects in camouflaged scenes, while using the zero-shot image classification ability of the Contrastive Language-Image Pre-training (CLIP) model to prevent synthesis failures and ensure the synthesized object aligns with the input prompt.

Image Augmentation Image Classification +3

Point-SLAM: Dense Neural Point Cloud-based SLAM

2 code implementations ICCV 2023 Erik Sandström, Yue Li, Luc van Gool, Martin R. Oswald

We propose a dense neural simultaneous localization and mapping (SLAM) approach for monocular RGBD input which anchors the features of a neural scene representation in a point cloud that is iteratively generated in an input-dependent data-driven manner.

Simultaneous Localization and Mapping

Online Lane Graph Extraction from Onboard Video

no code implementations3 Apr 2023 Yigit Baran Can, Alexander Liniger, Danda Pani Paudel, Luc van Gool

One of the most common and useful representation of such an understanding is done in the form of BEV lane graphs.

Autonomous Driving Navigate

Single Image Depth Prediction Made Better: A Multivariate Gaussian Take

no code implementations CVPR 2023 Ce Liu, Suryansh Kumar, Shuhang Gu, Radu Timofte, Luc van Gool

Accordingly, we introduce an approach that performs continuous modeling of per-pixel depth, where we can predict and reason about the per-pixel depth and its distribution.

Depth Estimation Depth Prediction

Enhanced Stable View Synthesis

no code implementations CVPR 2023 Nishant Jain, Suryansh Kumar, Luc van Gool

Extensive evaluation of our approach on the popular benchmark dataset, such as Tanks and Temples, shows substantial improvement in view synthesis results compared to the prior art.

3D Reconstruction Novel View Synthesis

NeRF-GAN Distillation for Efficient 3D-Aware Generation with Convolutions

1 code implementation22 Mar 2023 Mohamad Shahbazi, Evangelos Ntavelis, Alessio Tonioni, Edo Collins, Danda Pani Paudel, Martin Danelljan, Luc van Gool

Pose-conditioned convolutional generative models struggle with high-quality 3D-consistent image generation from single-view datasets, due to their lack of sufficient 3D priors.

Image Generation Inductive Bias

DiffIR: Efficient Diffusion Model for Image Restoration

1 code implementation ICCV 2023 Bin Xia, Yulun Zhang, Shiyin Wang, Yitong Wang, Xinglong Wu, Yapeng Tian, Wenming Yang, Luc van Gool

Diffusion model (DM) has achieved SOTA performance by modeling the image synthesis process into a sequential application of a denoising network.

Denoising Image Generation +1

Graph Transformer GANs for Graph-Constrained House Generation

no code implementations CVPR 2023 Hao Tang, Zhenyu Zhang, Humphrey Shi, Bo Li, Ling Shao, Nicu Sebe, Radu Timofte, Luc van Gool

We present a novel graph Transformer generative adversarial network (GTGAN) to learn effective graph node relations in an end-to-end fashion for the challenging graph-constrained house generation task.

Generative Adversarial Network House Generation +1

DDFM: Denoising Diffusion Model for Multi-Modality Image Fusion

4 code implementations ICCV 2023 Zixiang Zhao, Haowen Bai, Yuanzhi Zhu, Jiangshe Zhang, Shuang Xu, Yulun Zhang, Kai Zhang, Deyu Meng, Radu Timofte, Luc van Gool

To leverage strong generative priors and address challenges such as unstable training and lack of interpretability for GAN-based generative methods, we propose a novel fusion algorithm based on the denoising diffusion probabilistic model (DDPM).

Denoising

TrafficBots: Towards World Models for Autonomous Driving Simulation and Motion Prediction

3 code implementations7 Mar 2023 Zhejun Zhang, Alexander Liniger, Dengxin Dai, Fisher Yu, Luc van Gool

We present TrafficBots, a multi-agent policy built upon motion prediction and end-to-end driving, and based on TrafficBots we obtain a world model tailored for the planning module of autonomous vehicles.

Autonomous Driving Model-based Reinforcement Learning +1

A Multiplicative Value Function for Safe and Efficient Reinforcement Learning

1 code implementation7 Mar 2023 Nick Bührer, Zhejun Zhang, Alexander Liniger, Fisher Yu, Luc van Gool

To this end, we propose a safe model-free RL algorithm with a novel multiplicative value function consisting of a safety critic and a reward critic.

Navigate reinforcement-learning +3

Efficient and Explicit Modelling of Image Hierarchies for Image Restoration

1 code implementation CVPR 2023 Yawei Li, Yuchen Fan, Xiaoyu Xiang, Denis Demandolx, Rakesh Ranjan, Radu Timofte, Luc van Gool

The aim of this paper is to propose a mechanism to efficiently and explicitly model image hierarchies in the global, regional, and local range for image restoration.

Image Deblurring Image Defocus Deblurring +1

VA-DepthNet: A Variational Approach to Single Image Depth Prediction

2 code implementations13 Feb 2023 Ce Liu, Suryansh Kumar, Shuhang Gu, Radu Timofte, Luc van Gool

While state-of-the-art deep neural network methods for SIDP learn the scene depth from images in a supervised setting, they often overlook the invaluable invariances and priors in the rigid scene space, such as the regularity of the scene.

Depth Prediction Monocular Depth Estimation

No One Left Behind: Real-World Federated Class-Incremental Learning

2 code implementations2 Feb 2023 Jiahua Dong, Hongliu Li, Yang Cong, Gan Sun, Yulun Zhang, Luc van Gool

These issues render global model to undergo catastrophic forgetting on old categories, when local clients receive new categories consecutively under limited memory of storing old categories.

Class Incremental Learning Federated Learning +1

Continuous Pseudo-Label Rectified Domain Adaptive Semantic Segmentation With Implicit Neural Representations

no code implementations CVPR 2023 Rui Gong, Qin Wang, Martin Danelljan, Dengxin Dai, Luc van Gool

Unsupervised domain adaptation (UDA) for semantic segmentation aims at improving the model performance on the unlabeled target domain by leveraging a labeled source domain.

Pseudo Label Semantic Segmentation +1

Self-Supervised Burst Super-Resolution

no code implementations ICCV 2023 Goutam Bhat, Michaël Gharbi, Jiawen Chen, Luc van Gool, Zhihao Xia

Extensive experiments on real and synthetic data show that, despite only using noisy bursts during training, models trained with our self-supervised strategy match, and sometimes surpass, the quality of fully-supervised baselines trained with synthetic data or weakly-paired ground-truth.

Super-Resolution

Beyond SOT: Tracking Multiple Generic Objects at Once

1 code implementation22 Dec 2022 Christoph Mayer, Martin Danelljan, Ming-Hsuan Yang, Vittorio Ferrari, Luc van Gool, Alina Kuznetsova

Our approach achieves a 4x faster run-time in case of 10 concurrent objects compared to tracking each object independently and outperforms existing single object trackers on our new benchmark.

Attribute Object +1

One-Shot Domain Adaptive and Generalizable Semantic Segmentation with Class-Aware Cross-Domain Transformers

no code implementations14 Dec 2022 Rui Gong, Qin Wang, Dengxin Dai, Luc van Gool

Thus, we aim to relieve this need on a large number of real data, and explore the one-shot unsupervised sim-to-real domain adaptation (OSUDA) and generalization (OSDG) problem, where only one real-world data sample is available.

Autonomous Driving Domain Adaptation +1

MIC: Masked Image Consistency for Context-Enhanced Domain Adaptation

1 code implementation CVPR 2023 Lukas Hoyer, Dengxin Dai, Haoran Wang, Luc van Gool

MIC significantly improves the state-of-the-art performance across the different recognition tasks for synthetic-to-real, day-to-nighttime, and clear-to-adverse-weather UDA.

Image Classification object-detection +4

Surface Normal Clustering for Implicit Representation of Manhattan Scenes

1 code implementation ICCV 2023 Nikola Popovic, Danda Pani Paudel, Luc van Gool

In this work, we aim to leverage the geometric prior of Manhattan scenes to improve the implicit neural radiance field representations.

Clustering Novel View Synthesis

CDDFuse: Correlation-Driven Dual-Branch Feature Decomposition for Multi-Modality Image Fusion

4 code implementations CVPR 2023 Zixiang Zhao, Haowen Bai, Jiangshe Zhang, Yulun Zhang, Shuang Xu, Zudi Lin, Radu Timofte, Luc van Gool

We then introduce a dual-branch Transformer-CNN feature extractor with Lite Transformer (LT) blocks leveraging long-range attention to handle low-frequency global features and Invertible Neural Networks (INN) blocks focusing on extracting high-frequency local information.

object-detection Object Detection +1

Advancing Learned Video Compression with In-loop Frame Prediction

1 code implementation13 Nov 2022 Ren Yang, Radu Timofte, Luc van Gool

In this paper, we propose an Advanced Learned Video Compression (ALVC) approach with the in-loop frame prediction module, which is able to effectively predict the target frame from the previously compressed frames, without consuming any bit-rate.

MS-SSIM SSIM +1

MicroISP: Processing 32MP Photos on Mobile Devices with Deep Learning

no code implementations8 Nov 2022 Andrey Ignatov, Anastasia Sycheva, Radu Timofte, Yu Tseng, Yu-Syuan Xu, Po-Hsiang Yu, Cheng-Ming Chiang, Hsien-Kai Kuo, Min-Hung Chen, Chia-Ming Cheng, Luc van Gool

While neural networks-based photo processing solutions can provide a better image quality compared to the traditional ISP systems, their application to mobile devices is still very limited due to their very high computational complexity.

PyNet-V2 Mobile: Efficient On-Device Photo Processing With Neural Networks

1 code implementation8 Nov 2022 Andrey Ignatov, Grigory Malivenko, Radu Timofte, Yu Tseng, Yu-Syuan Xu, Po-Hsiang Yu, Cheng-Ming Chiang, Hsien-Kai Kuo, Min-Hung Chen, Chia-Ming Cheng, Luc van Gool

The increased importance of mobile photography created a need for fast and performant RAW image processing pipelines capable of producing good visual results in spite of the mobile camera sensor limitations.

Towards Versatile Embodied Navigation

1 code implementation30 Oct 2022 Hanqing Wang, Wei Liang, Luc van Gool, Wenguan Wang

With the emergence of varied visual navigation tasks (e. g, image-/object-/audio-goal and vision-language navigation) that specify the target in different ways, the community has made appealing advances in training specialized agents capable of handling individual navigation tasks well.

Decision Making Vision-Language Navigation +1

TripletTrack: 3D Object Tracking using Triplet Embeddings and LSTM

no code implementations28 Oct 2022 Nicola Marinello, Marc Proesmans, Luc van Gool

We start from an off-the-shelf 3D object detector, and apply a tracking mechanism where objects are matched by an affinity score computed on local object feature embeddings and motion descriptors.

3D Object Tracking Autonomous Driving +2

Masked Vision-Language Transformer in Fashion

1 code implementation27 Oct 2022 Ge-Peng Ji, Mingcheng Zhuge, Dehong Gao, Deng-Ping Fan, Christos Sakaridis, Luc van Gool

We present a masked vision-language transformer (MVLT) for fashion-specific multi-modal representation.

Image Reconstruction Retrieval

Learning Attention Propagation for Compositional Zero-Shot Learning

no code implementations20 Oct 2022 Muhammad Gul Zain Ali Khan, Muhammad Ferjad Naeem, Luc van Gool, Alain Pagani, Didier Stricker, Muhammad Zeshan Afzal

CAPE learns to identify this structure and propagates knowledge between them to learn class embedding for all seen and unseen compositions.

Compositional Zero-Shot Learning

Multi-View Photometric Stereo Revisited

no code implementations14 Oct 2022 Berk Kaya, Suryansh Kumar, Carlos Oliveira, Vittorio Ferrari, Luc van Gool

The proposed approach in this paper exploits the benefit of uncertainty modeling in a deep neural network for a reliable fusion of photometric stereo (PS) and multi-view stereo (MVS) network predictions.

3D Shape Representation

Composite Learning for Robust and Effective Dense Predictions

no code implementations13 Oct 2022 Menelaos Kanakis, Thomas E. Huang, David Bruggemann, Fisher Yu, Luc van Gool

In this paper, we find that jointly training a dense prediction (target) task with a self-supervised (auxiliary) task can consistently improve the performance of the target task, while eliminating the need for labeling auxiliary tasks.

Boundary Detection Monocular Depth Estimation +3

SiNeRF: Sinusoidal Neural Radiance Fields for Joint Pose Estimation and Scene Reconstruction

1 code implementation10 Oct 2022 Yitong Xia, Hao Tang, Radu Timofte, Luc van Gool

NeRFmm is the Neural Radiance Fields (NeRF) that deal with Joint Optimization tasks, i. e., reconstructing real-world scenes and registering camera parameters simultaneously.

Image Generation Pose Estimation

Robustifying the Multi-Scale Representation of Neural Radiance Fields

no code implementations9 Oct 2022 Nishant Jain, Suryansh Kumar, Luc van Gool

Although recently proposed Mip-NeRF could handle multi-scale imaging problems with NeRF, it cannot handle camera pose estimation error.

Camera Pose Estimation Graph Neural Network +1

Basic Binary Convolution Unit for Binarized Image Restoration Network

2 code implementations2 Oct 2022 Bin Xia, Yulun Zhang, Yitong Wang, Yapeng Tian, Wenming Yang, Radu Timofte, Luc van Gool

In this study, we reconsider components in binary convolution, such as residual connection, BatchNorm, activation function, and structure, for IR tasks.

Binarization Image Restoration +1

TT-NF: Tensor Train Neural Fields

1 code implementation30 Sep 2022 Anton Obukhov, Mikhail Usvyatsov, Christos Sakaridis, Konrad Schindler, Luc van Gool

Learning neural fields has been an active topic in deep learning research, focusing, among other issues, on finding more compact and easy-to-fit representations.

Denoising Low-rank compression

Physical Adversarial Attack meets Computer Vision: A Decade Survey

1 code implementation30 Sep 2022 Hui Wei, Hao Tang, Xuemei Jia, Zhixiang Wang, Hanxun Yu, Zhubo Li, Shin'ichi Satoh, Luc van Gool, Zheng Wang

Building upon this foundation, we uncover the pervasive role of artifacts carrying adversarial perturbations in the physical world.

Adversarial Attack Medical Diagnosis

Exploiting Instance-based Mixed Sampling via Auxiliary Source Domain Supervision for Domain-adaptive Action Detection

1 code implementation28 Sep 2022 Yifan Lu, Gurkirt Singh, Suman Saha, Luc van Gool

We propose a novel domain adaptive action detection approach and a new adaptation protocol that leverages the recent advancements in image-level unsupervised domain adaptation (UDA) techniques and handle vagaries of instance-level video data.

Action Detection Pseudo Label +2

I2DFormer: Learning Image to Document Attention for Zero-Shot Image Classification

no code implementations21 Sep 2022 Muhammad Ferjad Naeem, Yongqin Xian, Luc van Gool, Federico Tombari

In order to distill discriminative visual words from noisy documents, we introduce a new cross-modal attention module that learns fine-grained interactions between image patches and document words.

Generalized Zero-Shot Learning Image Classification +2

Spatio-Temporal Action Detection Under Large Motion

no code implementations6 Sep 2022 Gurkirt Singh, Vasileios Choutas, Suman Saha, Fisher Yu, Luc van Gool

Current methods for spatiotemporal action tube detection often extend a bounding box proposal at a given keyframe into a 3D temporal cuboid and pool features from nearby frames.

Action Detection

ManiFlow: Implicitly Representing Manifolds with Normalizing Flows

no code implementations18 Aug 2022 Janis Postels, Martin Danelljan, Luc van Gool, Federico Tombari

In contrast to prior work, we approach this problem by generating samples from the original data distribution given full knowledge about the perturbed distribution and the noise model.

Surface Reconstruction

AVisT: A Benchmark for Visual Object Tracking in Adverse Visibility

1 code implementation14 Aug 2022 Mubashir Noman, Wafa Al Ghallabi, Daniya Najiha, Christoph Mayer, Akshay Dudhane, Martin Danelljan, Hisham Cholakkal, Salman Khan, Luc van Gool, Fahad Shahbaz Khan

While being greatly benefiting to the tracking research, existing benchmarks do not pose the same difficulty as before with recent trackers achieving higher performance mainly due to (i) the introduction of more sophisticated transformers-based methods and (ii) the lack of diverse scenarios with adverse visibility such as, severe weather conditions, camouflage and imaging effects.

Visual Object Tracking Visual Tracking

Towards Interpretable Video Super-Resolution via Alternating Optimization

1 code implementation21 Jul 2022 JieZhang Cao, Jingyun Liang, Kai Zhang, Wenguan Wang, Qin Wang, Yulun Zhang, Hao Tang, Luc van Gool

These issues can be alleviated by a cascade of three separate sub-tasks, including video deblurring, frame interpolation, and super-resolution, which, however, would fail to capture the spatial and temporal correlations among video sequences.

Deblurring Space-time Video Super-resolution +2

Refign: Align and Refine for Adaptation of Semantic Segmentation to Adverse Conditions

1 code implementation14 Jul 2022 David Bruggemann, Christos Sakaridis, Prune Truong, Luc van Gool

Due to the scarcity of dense pixel-level semantic annotations for images recorded in adverse visual conditions, there has been a keen interest in unsupervised domain adaptation (UDA) for the semantic segmentation of such images.

Semantic Segmentation Unsupervised Domain Adaptation

Organic Priors in Non-Rigid Structure from Motion

no code implementations13 Jul 2022 Suryansh Kumar, Luc van Gool

Besides that, the paper provides insights into the NRSfM factorization -- both in terms of shape and motion -- and is the first approach to show the benefit of single rotation averaging for NRSfM.

L2E: Lasers to Events for 6-DoF Extrinsic Calibration of Lidars and Event Cameras

1 code implementation3 Jul 2022 Kevin Ta, David Bruggemann, Tim Brödermann, Christos Sakaridis, Luc van Gool

As neuromorphic technology is maturing, its application to robotics and autonomous vehicle systems has become an area of active research.

Autonomous Driving Camera Calibration