We introduce dense vision transformers, an architecture that leverages vision transformers in place of convolutional networks as a backbone for dense prediction tasks.
Ranked #1 on Monocular Depth Estimation on NYU-Depth V2 (using extra training data)
In particular, we propose a robust training objective that is invariant to changes in depth range and scale, advocate the use of principled multi-objective learning to combine data from different sources, and highlight the importance of pretraining encoders on auxiliary tasks.
In this work we propose to learn to reconstruct intensity images from event streams directly from data instead of relying on any hand-crafted priors.
Since the output of event cameras is fundamentally different from conventional cameras, it is commonly accepted that they require the development of specialized algorithms to accommodate the particular nature of events.
The Fields of Experts (FoE) image prior model, a filter-based higher-order Markov Random Fields (MRF) model, has been shown to be effective for many image restoration problems.
It is now well known that Markov random fields (MRFs) are particularly effective for modeling image priors in low-level vision.
Inpainting based image compression approaches, especially linear and non-linear diffusion models, are an active research topic for lossy image compression.
Numerical experiments show that our trained models clearly outperform existing analysis operator learning approaches and are on par with state-of-the-art image denoising algorithms.