In light of the vulnerability of deep learning models to adversarial samples and the ensuing security issues, a range of methods, including Adversarial Training (AT) as a prominent representative, aimed at enhancing model robustness against various adversarial attacks, have seen rapid development.
Super-resolution tasks oriented to images captured in ultra-dark environments is a practical yet challenging problem that has received little attention.
In this work, we try to arouse the potential of enhancer + detector.
To overcome these limitations, we propose a Macro-Micro-Hierarchical transformer, which consists of a macro attention to capture long-range dependencies, a micro attention to extract local features, and a hierarchical structure for coarse-to-fine correction.
Therefore, we propose a residual feature transference module (RFTM) to learn a mapping between deep representations of the heavily degraded patches of DFUI- and underwater- images, and make the mapping as a heavily degraded prior (HDP) for underwater detection.
We first conduct systematic analyses about the components of image fusion, investigating the correlation with segmentation robustness under adversarial perturbations.
Ranked #12 on Thermal Image Segmentation on MFN Dataset
In this study, we propose a generic low-light vision solution by introducing a generative block to convert data from the RAW to the RGB domain.
Multi-modality image fusion and segmentation play a vital role in autonomous driving and robotic operation.
Ranked #15 on Thermal Image Segmentation on MFN Dataset
Underwater images suffer from light refraction and absorption, which impairs visibility and interferes the subsequent applications.
Multi-spectral image stitching leverages the complementarity between infrared and visible images to generate a robust and reliable wide field-of-view (FOV) scene.
The complexity of learning problems, such as Generative Adversarial Network (GAN) and its variants, multi-task and meta-learning, hyper-parameter learning, and a variety of real-world vision applications, demands a deeper understanding of their underlying coupling mechanisms.
Based on these factual opinions, we propose a bilevel optimization formulation for jointly learning underwater object detection and image enhancement, and then unroll to a dual perception network (DPNet) for the two tasks.
A bilevel learning framework is constructed to endow the scene-irrelevant generality of the encoder towards diverse scenes (i. e., freezing the encoder in the adaptation and testing phases).
Extensive experiments on synthetic benchmarks and real-world images demonstrate that the proposed RDMC delivers strong performance on the depiction of rain streaks and outperforms the state-of-the-art methods.
In light of the significant progress made in the development and application of semantic segmentation tasks, there has been increasing attention towards improving the robustness of segmentation models against natural degradation factors (e. g., rain streaks) or artificially attack factors (e. g., adversarial attack).
Qualitative and quantitative experimental results on different categories of image fusion problems and related downstream tasks (e. g., visual enhancement and semantic understanding) substantiate the flexibility and effectiveness of our TIM.
In recent years, deep learning-based methods have achieved remarkable progress in multi-exposure image fusion.
3D object detection plays a crucial role in numerous intelligent vision systems.
Their common characteristic of seeking complementary cues from different source images motivates us to explore the collaborative relationship between Fusion and Salient object detection tasks on infrared and visible images via an Interactively Reinforced multi-task paradigm for the first time, termed IRFS.
Low-light situations severely restrict the pursuit of aesthetic quality in consumer photography.
Recently, multi-modality scene perception tasks, e. g., image fusion and scene understanding, have attracted widespread attention for intelligent vision systems.
Since the differences in viewing range, resolution and relative position, the multi-modality sensing module composed of infrared and visible cameras needs to be registered so as to have more accurate scene perception.
A multi-level hybrid loss is firstly designed to guide the network to learn pixel-level, region-level, and object-level features.
Improving the visual quality of the given degraded observation by correcting exposure level is a fundamental task in the computer vision community.
To address these challenges, in this letter, we develop a semantic-level fusion network to sufficiently utilize the semantic guidance, emancipating the experimental designed fusion rules.
Infrared and visible image fusion targets to provide an informative image by combining complementary information from different sensors.
To address the above limitations, we develop an efficient and compact enhancement network in collaboration with a high-level semantic-aware pretrained model, aiming to exploit its hierarchical feature representation as an auxiliary for the low-level underwater image enhancement.
With the proliferation of versatile Internet of Things (IoT) services, smart IoT devices are increasingly deployed at the edge of wireless networks to perform collaborative machine learning tasks using locally collected data, giving rise to the edge learning paradigm.
data issues and Byzantine attacks, global data samples are introduced in CB-DSL and shared among IoT workers, which not only alleviates the local data heterogeneity effectively but also enables to fully utilize the exploration-exploitation mechanism of swarm intelligence.
As a highly ill-posed issue, single image super-resolution (SISR) has been widely investigated in recent years.
Moreover, to better fuse the registered infrared images and visible images, we present a feature Interaction Fusion Module (IFM) to adaptively select more meaningful features for fusion in the Dual-path Interaction Fusion Network (DIFN).
In past years, the minimax type single-level optimization formulation and its variations have been widely utilized to address Generative Adversarial Networks (GANs).
Existing BDE methods have no unified solution for various BDE situations, and directly learn a mapping for each pixel from LBD image to the desired value in HBD image, which may change the given high-order bits and lead to a huge deviation from the ground truth.
Existing low-light image enhancement techniques are mostly not only difficult to deal with both visual quality and computational efficiency but also commonly invalid in unknown complex scenarios.
This study addresses the issue of fusing infrared and visible images that appear differently for object detection.
Ranked #1 on Object Detection on Multispectral Dataset
Deformable image registration plays a critical role in various tasks of medical image analysis.
It is challenging to restore low-resolution (LR) images to super-resolution (SR) images with correct and clear details.
To tackle camouflaged object detection (COD), we are inspired by humans attention coupled with the coarse-to-fine detection strategy, and thereby propose an iterative refinement framework, coined SegMaR, which integrates Segment, Magnify and Reiterate in a multi-stage detection fashion.
Further, by sharing an encoder for these two components, we obtain a more lightweight version (SLiteCSDNet for short).
To partially address above issues, we establish Retinex-inspired Unrolling with Architecture Search (RUAS), a general learning framework, which not only can address low-light enhancement task, but also has the flexibility to handle other more challenging downstream vision applications.
In this paper, we develop a model-guided triple-level optimization framework to deduce network architecture with cooperating optimization and auto-searching mechanism, named Triple-level Model Inferred Cooperating Searching (TMICS), for dealing with various video rain circumstances.
As a promising distributed learning technology, analog aggregation based federated learning over the air (FLOA) provides high communication efficiency and privacy provisioning under the edge computing paradigm.
Based on this dataset, we propose a semi-supervised underwater semantic segmentation network focusing on the boundaries(US-Net: Underwater Segmentation Network).
Generating high-quality stitched images with natural structures is a challenging task in computer vision.
Towards these challenges we introduce a dataset, Detecting Underwater Objects (DUO), and a corresponding benchmark, based on the collection and re-annotation of all relevant datasets.
Federated learning (FL) is an attractive paradigm for making use of rich distributed data while protecting data privacy.
For distributed learning among collaborative users, this paper develops and analyzes a communication-efficient scheme for federated learning (FL) over the air, which incorporates 1-bit compressive sensing (CS) into analog aggregation transmissions.
Specifically, by introducing a general energy minimization model and formulating its descent direction from different viewpoints (i. e., in a generative manner, based on the discriminative metric and with optimality-based correction), we construct three propagative modules to effectively solve the optimization models with flexible combinations.
Low-light image enhancement plays very important roles in low-level vision field.
We design a new deep learning based framework to optimize a diffeomorphic model via multi-scale propagation in order to integrate advantages and avoid limitations of these two categories of approaches.
We also propose a novel Poisson-blending Generative Adversarial Network (Poisson GAN) and an efficient object detection network (AquaNet) to address two common issues within related datasets: the class-imbalance problem and the problem of mass small object, respectively.
Compressed Sensing Magnetic Resonance Imaging (CS-MRI) significantly accelerates MR data acquisition at a sampling rate much lower than the Nyquist criterion.
Properly modeling latent image distributions plays an important role in a variety of image-related vision problems.
Then, an attention mechanism is proposed to model the relations between the image region and blocks and generate the valuable position feature, which will be further utilized to enhance the region expression and model a more reliable relationship between the visual image and the textual sentence.
Plenty of experimental results of underexposed image correction demonstrate that our proposed method performs favorably against the state-of-the-art methods on both subjective and objective assessments.
To this end, we first leverage a stand-alone module to transform the input data from 2D image plane to 3D point clouds space for a better input representation, then we perform the 3D detection using PointNet backbone net to obtain objects 3D locations, dimensions and orientations.
Underwater image enhancement is such an important low-level vision task with many applications that numerous algorithms have been proposed in recent years.
Different from these existing network based iterations, which often lack theoretical investigations, we provide strict convergence analysis for PODM in the challenging nonconvex and nonsmooth scenarios.
Magnetic Resonance Imaging (MRI) is one of the most dynamic and safe imaging techniques available for clinical applications.
Moreover, there is a lack of rigorous analysis about the convergence behaviors of these reimplemented iterations, and thus the significance of such methods is a little bit vague.
Blind image deblurring plays a very important role in many vision and multimedia applications.
Operator splitting methods have been successfully used in computational sciences, statistics, learning and vision areas to reduce complex problems into a series of simpler subproblems.
Cascaded regression is prevailing in face alignment thanks to its accuracy and robustness, but typically demands manually annotated examples having low discrepancy between shape-indexed features and shape updates.
Deep learning models have gained great success in many real-world applications.
However, they may fail when their assumptions are not valid on specific images.
We develop dual deep networks with memorable gated recurrent units (GRUs), and sequentially feed these two types of features into the dual networks, respectively.
In particular, the hierarchical structure of ontology has not been sufficiently utilized in clustering genes while functionally related genes are consistently associated with phenotypes on the same path in the phenotype ontology.
Detecting elliptical objects from an image is a central task in robot navigation and industrial diagnosis where the detection time is always a critical issue.