We also propose the distinctive patch convolution for feature representation learning to reduce the time consumption.
Important findings on the use of spatial and spectral information in the autoencoder framework are discussed.
Semi-supervised learning (SSL), which aims at leveraging a few labeled images and a large number of unlabeled images for network training, is beneficial for relieving the burden of data annotation in medical image segmentation.
Second, an adaptive DropBlock (AdapDrop) is proposed as a regularization method employed in the generator and discriminator to alleviate the mode collapse issue.
Synthetic aperture radar (SAR) image change detection is a vital yet challenging task in the field of remote sensing image analysis.
Outside-knowledge visual question answering (OK-VQA) requires the agent to comprehend the image, make use of relevant knowledge from the entire web, and digest all the information to answer the question.
Taking meta features as reference, we propose compositional operations to eliminate irrelevant features of local convolutional features by an addressing process and then to reformulate the convolutional feature maps as a composition of related meta features.
Most previous works address the problem by first fusing the image and question in the multi-modal space, which is inflexible for further fusion with a vast amount of external knowledge.
Physical modeling methods can offer the potential for extrapolation beyond observational conditions, while data-driven methods are flexible in adapting to data and are capable of detecting unexpected patterns.
Moreover, a correlation layer is designed to further explore the correlation between multitemporal images.
Multi-hop machine reading comprehension is a challenging task in natural language processing, which requires more reasoning ability and explainability.
Based on this, we propose a general-purpose deep clustering framework which radically integrates representation learning and clustering into a single pipeline for the first time.
In summary, the EOQ framework is specially designed for reducing the high cost of convolution and BN in network training, demonstrating a broad application prospect of online training in resource-limited devices.
Skin blemishes and diseases have attracted increasing research interest in recent decades, due to their growing frequency of occurrence and the severity of related diseases.
In addition, we further propose a multi-region convolution module, which emphasizes the central region of each patch.
In this paper, we propose a deep model to ground shooting range of small intestine from a capsule endoscope video which has duration of tens of hours.
An efficient linear self-attention fusion model is proposed in this paper for the task of hyperspectral image (HSI) and LiDAR data joint classification.
As the ground objects become increasingly complex, the classification results obtained by single source remote sensing data can hardly meet the application requirements.
Convolutional neural networks (CNN) have made great progress for synthetic aperture radar (SAR) images change detection.
Then, a new style discriminator is designed to improve the translation performance.
However the function of calcium signaling in epithelial cells is not well understood.
Charge transport in disordered organic semiconductors occurs by hopping of charge carriers between localized sites that are randomly distributed in a strongly energy dependent density of states.
The fault surface morphology is the direct result of the microscopic processes near the crack tip or on the frictional interface.
However, it is still in a developing stage, and a lot of experiments have to be performed in a simulation setting.
Learning depth and ego-motion from unlabeled videos via self-supervision from epipolar projection can improve the robustness and accuracy of the 3D perception and localization of vision-based robots.
Seeing that the proposed generalization problem has not been widely studied yet, we carefully define an evaluation protocol, with which we illustrate the effectiveness of MEIP on two proof-of-concept domains and one challenging task: learning to fold from demonstrations.
In this paper, we investigate how the recently introduced pre-trained language model BERT can be adapted for Chinese biomedical corpora and propose a novel conceptualized representation learning approach.
In this review, we performed an overview of some new developments and challenges in the application of machine learning to medical image analysis, with a special focus on deep learning in photoacoustic imaging.
We demonstrate the power of this perspective to develop cognitive AI systems with humanlike common sense by showing how to observe and apply FPICU with little training data to solve a wide range of challenging tasks, including tool use, planning, utility inference, and social learning.
In this paper, we propose an actor-critic method - Attention-based Twin Delayed Deep Deterministic policy gradient (ATD3) algorithm to approximate a driver' s action according to observations and measure the driver' s attention allocation for consecutive time steps in car-following model.
Short text is becoming more and more popular on the web, such as Chat Message, SMS and Product Reviews.
In this work, we develop a novel PACT system to provide real-time imaging, which is achieved by a 120-elements ultrasound array only using a single data acquisition (DAQ) channel.
"Thinking in pictures,"  i. e., spatial-temporal reasoning, effortless and instantaneous for humans, is believed to be a significant ability to perform logical induction and a crucial factor in the intellectual history of technology development.
As a result, the proposed construction unifies and extends known theoretical results for many of the existing graph scattering architectures.
The Euclidean scattering transform was introduced nearly a decade ago to improve the mathematical understanding of convolutional neural networks.
Furthermore, ConvNets inspired recent advances in geometric deep learning, which aim to generalize these networks to graph data by applying notions from graph signal processing to learn deep graph filter cascades.
In this work, we propose a new dataset, built in the context of Raven's Progressive Matrices (RPM) and aimed at lifting machine intelligence by associating vision with structural, relational, and analogical reasoning in a hierarchical representation.
In order to address these limitations, we present tree-structured ConvLSTM models for tree-structured image analysis tasks which can be trained end-to-end.
The hierarchical attention components of the residual attention subnet force our network to focus on the key components of the X-ray images and generate the final predictions as well as the associated visual supports, which is similar to the assessment procedure of clinicians.
We explore the generalization of scattering transforms from traditional (e. g., image or audio) signals to graph data, analogous to the generalization of ConvNets in geometric deep learning, and the utility of extracted graph features in graph data analysis.
In this paper, we propose to leverage intra-class variance in metric learning of triplet network to improve the performance of fine-grained recognition.
Object detection aims to identify instances of semantic objects of a certain class in images or videos.