KTM models the contextual correlation knowledge of two middle-level features of different scales based on the self-attention mechanism, and transfers the knowledge to the raw features to generate more discriminative features.
Our current related research addresses multiple novel proposed research works and compares their advantages and disadvantages between the derived deep learning frameworks rather than machine learning frameworks.
In this work, we propose a deep modal shared information learning module based on the covariance matrix to capture the shared information between modalities.
To address this issue, we propose a slice grouped domain attention (SGDA) module to enhance the generalization capability of the pulmonary nodule detection networks.
Incipient fault detection in power distribution systems is crucial to improve the reliability of the grid.
Then, semantic kernels are used to activate salient object locations in two groups of high-level features through dynamic convolution operations in DSMM.
Since Rendle and Krichene argued that commonly used sampling-based evaluation metrics are "inconsistent" with respect to the global metrics (even in expectation), there have been a few studies on the sampling-based recommender system evaluation.
In financial fraud detection, the modus operandi of criminals can be identified by analyzing user profile and their behaviors such as transaction, loaning etc.
The highlight of LASNet is that we fully consider the characteristics of cross-modal features at different levels, and accordingly propose three specific modules for better segmentation.
Ranked #21 on Thermal Image Segmentation on MFN Dataset
In this paper, we propose task-specific meta distillation that simultaneously learns two models in meta-learning: a large teacher model and a small student model.
In this work, we propose PolarBEV for vision-based uneven BEV representation learning.
A major concern is that residents do not report problems at the same rates, with heterogeneous reporting delays directly translating to downstream disparities in how quickly incidents can be addressed.
As the key component of ACCoNet, ACCoM activates the salient regions of output features of the encoder and transmits them to the decoder.
In this brief, an improved event-triggered update mechanism (ETM) for the linear quadratic regulator is proposed to solve the lateral motion control problem of intelligent vehicle under bounded disturbances.
Most of the current prediction methods combining saliency detection and FoV information neither take into account that the distortion of projected 360-degree videos can invalidate the weight sharing of traditional convolutional networks, nor do they adequately consider the difficulty of obtaining complete multi-user FoV information, which degrades the prediction performance.
Then, following the coarse-to-fine strategy, we generate an initial coarse saliency map from high-level semantic features in a Correlation Module (CorrM).
Considering uncertainties and voltage fluctuation issues introduced by RERs, in this paper, we propose a deep reinforcement learning (DRL)-based strategy leveraging spatial-temporal (ST) graphical information of power systems, to dynamically search for the optimal operation, i. e., optimal power flow (OPF), of power systems with a high uptake of RERs.
In this paper, we propose a novel Multi-Content Complementation Network (MCCNet) to explore the complementarity of multiple content for RSI-SOD.
Recently, a wide range of recommendation algorithms inspired by deep learning techniques have emerged as the performance leaders several standard recommendation benchmarks.
3D convolutional neural networks have achieved promising results for video tasks in computer vision, including video saliency prediction that is explored in this paper.
Through the derivation and analysis of the closed-form solutions for two basic regression and matrix factorization approaches, we found these two approaches are indeed inherently related but also diverge in how they "scale-down" the singular values of the original user-item interaction matrix.
We study how to support elasticity, that is, the ability to dynamically adjust the parallelism (i. e., the number of GPUs), for deep neural network (DNN) training in a GPU cluster.
The proposed approaches either are rather uninformative (linking sampling to metric evaluation) or can only work on simple metrics, such as Recall/Precision (Krichene and Rendle 2020; Li et al. 2020).
In this paper, we propose a cross-modal self-attention (CMSA) module to utilize fine details of individual words and the input image or video, which effectively captures the long-range dependencies between linguistic and visual features.
Ranked #5 on Referring Expression Segmentation on J-HMDB (Precision@0.9 metric)
In this paper, we focus on Personal Fixations-based Object Segmentation (PFOS) to address issues in previous studies, such as the lack of appropriate dataset and the ambiguity in fixations-based interaction.
To this end, we propose a Decentralized Reinforcement Learning at the Edge for traffic light control in the IoV (DRLE).
Using the proposed deep RL scheme, each MU in the system is able to make decisions without a priori statistical knowledge of dynamics.
Facing the trend of merging wireless communications and multi-access edge computing (MEC), this article studies computation offloading in the beyond fifth-generation networks.
In this paper, we propose a novel Cross-Modal Weighting (CMW) strategy to encourage comprehensive interactions between RGB and depth channels for RGB-D SOD.
Ranked #9 on RGB-D Salient Object Detection on NJU2K
In a gravitational collapse scenario, they also depend on the details of the collapsing classical matter which determines the background spacetime.
General Relativity and Quantum Cosmology High Energy Physics - Theory
The outbreak of COVID-19 caused by SARS-CoV-2 has rapidly spread worldwide and has caused over 1, 400, 000 infections and 80, 000 deaths.
Given an input image and a referring expression in the form of a natural language sentence, the goal is to segment the object of interest in the image referred by the linguistic query.
In this study, we demonstrate first, to the best of our knowledge, robust and dynamically polarization-controlled tunable-high-Q PIT in designed nanostructures metasurface, whose sharp resonance is guaranteed by design and protected against large geometrical imperfections.
Optics Applied Physics
In this paper, we investigate the problem of age of information (AoI)-aware radio resource management for expected long-term performance optimization in a Manhattan grid vehicle-to-vehicle network.
We demonstrate that on a fixed network architecture, modifying the loss function can significantly improve (or depreciate) the results, hence emphasizing the importance of the choice of the loss function when designing a model.
This module controls the information flow of features at different levels.
Ranked #5 on Referring Video Object Segmentation on Refer-YouTube-VOS (using extra training data)
Based on the extended fingerprint database, the accuracy of indoor localization system can be improved with reduced human effort.
Networking and Internet Architecture
Models based on deep convolutional neural networks (CNN) have significantly improved the performance of semantic segmentation.
In this work, we address the face parsing task with a Fully-Convolutional continuous CRF Neural Network (FC-CNN) architecture.
In this paper, we demonstrate that the computational modelling of visual attention, through the use of saccadic model, can be efficiently adapted to emulate the gaze behavior of a specific group of observers.
In order to detect these spoofed speech signals as a countermeasure, we propose a score level fusion approach with several different i-vector subsystems.