To enable the device model to deal with changing environments, we propose a new learning paradigm of Cloud-Device Collaborative Continual Adaptation, which encourages collaboration between cloud and device and improves the generalization of the device model.
In order to find them, we further propose a LiDAR-guided sampling strategy to leverage the statistical distribution of LiDAR to determine the heights of local slices.
In this paper, we propose a unified framework named BEV-LGKD to transfer the knowledge in the teacher-student manner.
Vision-Centric Bird-Eye-View (BEV) perception has shown promising potential and attracted increasing attention in autonomous driving.
Limited-Angle Computed Tomography (LACT) is a non-destructive evaluation technique used in a variety of applications ranging from security to medicine.
Deep model-based architectures (DMBAs) are widely used in imaging inverse problems to integrate physical measurement models and learned image priors.
Our numerical results on in-vivo MRI data show that SelfDEQ leads to state-of-the-art performance using only undersampled and noisy training data.
However, estimation of accurate CSMs is a challenging problem when measurements are highly undersampled.
Our core idea is to leverage a style-based generator to empower high-fidelity and robust face swapping, thus the generator's advantage can be adopted for optimizing identity similarity.
Three-dimensional fluorescence microscopy often suffers from anisotropy, where the resolution along the axial direction is lower than that within the lateral imaging plane.
Neuromorphic spike data, an upcoming modality with high temporal resolution, has shown promising potential in real-world applications due to its inherent advantage to overcome high-velocity motion blur.
In this paper, we propose a novel Uncertainty-Guided Depth Fusion (UGDF) framework to fuse the predictions of monocular and stereo depth estimation networks for spike camera.
While the empirical performance and theoretical properties of DMBAs have been widely investigated, the existing work in the area has primarily focused on their performance when the desired image prior is known exactly.
Our method significantly reduces the computational cost and achieves even better performance, paving the way for applying neural video delivery techniques to practical applications.
However, the dependence of the computational/memory complexity of the measurement models in PnP/RED on the total number of measurements leaves DEQ impractical for many imaging applications.
Instead of explicitly disentangling global or component-wise modeling, the cross-attention mechanism can attend to the right local styles in the reference glyphs and aggregate the reference styles into a fine-grained style representation for the given content glyphs.
To solve this problem, we propose an end-to-end cross-domain detection Transformer based on the mean teacher framework, MTTrans, which can fully exploit unlabeled target domain data in object detection training and transfer knowledge between domains via pseudo labels.
Particularly, seamless blending is achieved with the help of a Semantic-Guided Color Reference Creation procedure and a Blending UNet.
We propose a new plug-and-play priors (PnP) based MR image reconstruction method that systematically enforces data consistency while also exploiting deep-learning priors.
Once the incremental capacity is below the threshold, the patch can exit at the specific layer.
Regularization by denoising (RED) is a widely-used framework for solving inverse problems by leveraging image denoisers as image priors.
However, existing methods mostly train the DNNs on uniformly sampled LR-HR patch pairs, which makes them fail to fully exploit informative patches within the image.
LEARN-IMG performs motion correction on mGRE images and relies on the subsequent analysis for the estimation of $R_2^\ast$ maps, while LEARN-BIO directly performs motion- and $B0$-inhomogeneity-corrected $R_2^\ast$ estimation.
Internet video delivery has undergone a tremendous explosion of growth over the past few years.
Deep neural networks for medical image reconstruction are traditionally trained using high-quality ground-truth images as training targets.
The plug-and-play priors (PnP) and regularization by denoising (RED) methods have become widely used for solving inverse problems by leveraging pre-trained deep denoisers as image priors.
We propose Coordinate-based Internal Learning (CoIL) as a new deep-learning (DL) methodology for the continuous representation of measurements.
Deep unfolding networks have recently gained popularity in the context of solving imaging inverse problems.
Graph Reasoning has shown great potential recently in modeling long-range dependencies, which are crucial for various computer vision tasks.
Cal-RED extends the traditional RED methodology to imaging problems that require the calibration of the measurement operator.
Deep learning-based image denoising approaches have been extensively studied in recent years, prevailing in many public benchmark datasets.
Regularization by denoising (RED) is a recently developed framework for solving inverse problems by integrating advanced denoisers as image priors.
One of the key limitations in conventional deep learning based image reconstruction is the need for registered pairs of training images containing a set of high-quality groundtruth images.
Plug-and-play priors (PnP) is a methodology for regularized image reconstruction that specifies the prior through an image denoiser.
To further promote the research of ship detection, we introduced a new fine-grained ship detection datasets, which is named as FGSD.
We introduce a new algorithm for regularized reconstruction of multispectral (MS) images from noisy linear measurements.
Extracting entity from images is a crucial part of many OCR applications, such as entity recognition of cards, invoices, and receipts.
Most existing text reading benchmarks make it difficult to evaluate the performance of more advanced deep learning models in large vocabularies due to the limited amount of training data.
Most previous image matting methods require a roughly-specificed trimap as input, and estimate fractional alpha values for all pixels that are in the unknown region of the trimap.
Specifically, we propose an end-to-end trainable style retention network (SRNet) that consists of three modules: text conversion module, background inpainting module and fusion module.
In this paper we present a new data-driven method for robust skin detection from a single human portrait image.
In this work, we develop a new block coordinate RED algorithm that decomposes a large-scale estimation problem into a sequence of updates over a small subset of the unknown variables.
Compared with image inpainting, performing this task on video presents new challenges such as how to preserving temporal consistency and spatial details, as well as how to handle arbitrary input video size and length fast and efficiently.
In this paper, we present new data pre-processing and augmentation techniques for DNN-based raw image denoising.
However, text in the wild is usually perspectively distorted or curved, which can not be easily tackled by existing approaches.
Reading text from images remains challenging due to multi-orientation, perspective distortion and especially the curved nature of irregular text.
In the past decade, sparsity-driven regularization has led to significant improvements in image reconstruction.