A progressive texture blending module is designed to blend the encoded two-stream features in a multi-scale and progressive manner.
Deep learning provides a promising way to extract effective representations from raw data in an end-to-end fashion and has proven its effectiveness in various domains such as computer vision, natural language processing, etc.
Through key-value matching based on relevancy evaluation, the proposed MatchVIE can bypass the recognitions to various semantics, and simply focuses on the strong relevancy between entities.
Secondly, on top of the proposed graph transformer, we introduce a two-stream encoder that separately extracts representations from temporal neighborhoods associated with the two interaction nodes and then utilizes a co-attentional transformer to model inter-dependencies at a semantic level.
For building a robust point detector, a fully convolutional network with feature fusion module is adopted, which can distinguish close points compared to traditional methods.
Due to the lack of natural scene and haze prior information, it is greatly challenging to completely remove the haze from single image without distorting its visual content.
In this paper, we present the Block-wise Abstract Syntax Tree Splitting method (BASTS for short), which fully utilizes the rich tree-form syntax structure in ASTs, for improving code summarization.
The most difficult part of the design is to choose an appropriate strategy to generate the fused image for a specific task in hand.
Rapid, accurate and robust detection of looming objects in cluttered moving backgrounds is a significant and challenging problem for robotic visual systems to perform collision detection and avoidance tasks.
Robust and accurate detection of small moving targets in cluttered moving backgrounds is a significant and challenging problem for robotic visual systems to perform search and tracking tasks.
1 code implementation • 25 Feb 2021 • Yuanhan Zhang, Zhenfei Yin, Jing Shao, Ziwei Liu, Shuo Yang, Yuanjun Xiong, Wei Xia, Yan Xu, Man Luo, Jian Liu, Jianshu Li, Zhijun Chen, Mingyu Guo, Hui Li, Junfu Liu, Pengfei Gao, Tianqi Hong, Hao Han, Shijie Liu, Xinhua Chen, Di Qiu, Cheng Zhen, Dashuang Liang, Yufeng Jin, Zhanlong Hao
It is the largest face anti-spoofing dataset in terms of the numbers of the data and the subjects.
We leverage Graph Neural Network and multi-task learning to design M$^3$Rec in order to model the complex information in the heterogeneous sequential recommendation scenario of Tencent Games.
In this work, we propose a Self Sparse Generative Adversarial Network (Self-Sparse GAN) that reduces the parameter space and alleviates the zero gradient problem.
no code implementations • 25 Jan 2021 • Ze-Bin Wu, Daniel Putzky, Asish K. Kundu, Hui Li, Shize Yang, Zengyi Du, Sang Hyun Joo, Jinho Lee, Yimei Zhu, Gennady Logvenov, Bernhard Keimer, Kazuhiro Fujita, Tonica Valla, Ivan Bozovic, Ilya K. Drozdov
This indicates that the pseudogap and superconductivity are of different origins.
Superconductivity Materials Science
Recent advances in machine learning, wireless communication, and mobile hardware technologies promisingly enable federated learning (FL) over massive mobile edge devices, which opens new horizons for numerous intelligent mobile applications.
In MDP, we first propose a novel real-time model extraction status assessment scheme called Monitor to evaluate the situation of the model.
Fashion products typically feature in compositions of a variety of styles at different clothing parts.
In singular models, the optimal set of parameters forms an analytic set with singularities and classical statistical inference cannot be applied to such models.
In our proposed fusion strategy, spatial attention models and channel attention models are developed that describe the importance of each spatial position and of each channel with deep features.
This approach, called Quantum Clustering (QC), deals with unlabeled data processing and constructs a potential function to find the centroids of clusters and the outliers.
On the other hand, the 2D attentional based license plate recognizer with an Xception-based CNN encoder is capable of recognizing license plates with different patterns under various scenarios accurately and robustly.
In this case, we need to use commonsense knowledge to identify the objects in the image.
Most of the state-of-the-art (SoTA) VQA methods fail to answer these questions because of i) poor text reading ability; ii) lacking of text-visual reasoning capacity; and iii) adopting a discriminative answering mechanism instead of a generative one which is hard to cover both OCR tokens and general text tokens in the final answer.
We finally estimate the reduced reproductive number and the population spared from infections due to restricting SA at 40, 964, 180, 336, 174, 494, in China, United States, and Europe respectively.
A hallmark of an AI agent is to mimic human beings to understand and interact with others.
To improve the visual quality of underwater images, we proposed a novel enhancement model, which is a trainable end-to-end neural model.
A self-coding deep neural network is designed to identify the structural modal parameters from the vibration data of structures.
We develop a recovery framework for automatic crack segmentation of compressed crack images based on this new CS method and demonstrate the remarkable performance of the method taking advantage of the strong capability of generative models to capture the necessary features required in the crack segmentation task even the backgrounds of the generated images are not well reconstructed.
We find that, in line with the few other species for which data are available, the embryonic mortality of zebrafish has a prominent peak shortly after fertilization.
Our ablation studies show that Dynamic Experience Replay is a crucial ingredient that either largely shortens the training time in these challenging environments or solves the tasks that the vanilla Ape-X DDPG cannot solve.
Public intelligent services enabled by machine learning algorithms are vulnerable to model extraction attacks that can steal confidential information of the learning models through public queries.
For three source images, a joint region segmentation method based on segmentation of two images is used to obtain the final segmentation result.
To tackle this problem, we propose to iteratively guess pseudo labels for the unlabeled image samples, which are later used to update the re-identification model together with the labelled samples.
A robust single-shot 3D shape reconstruction technique integrating the fringe projection profilometry (FPP) technique with the deep convolutional neural networks (CNNs) is proposed in this letter.
Considering the prior human knowledge that these structures are in conformity to regular spatial layouts in terms of components, a learning-based topology-aware 3D reconstruction method which can obtain high-level structural graph layouts and low-level 3D shapes from images is proposed in this paper.
As the number of required samples have been recently proven to be lower bounded by a particular threshold that presets tradeoff between the accuracy and efficiency, the result quality of these traditional solutions is hard to be further improved without sacrificing efficiency.
In this work, we propose a simple yet strong approach for scene text recognition.
Deep part-based methods in recent literature have revealed the great potential of learning local part-level representation for pedestrian image in the task of person re-identification.
Although various transfer learning methods have shown promising performance in this context, our proposed novel method RecSys-DAN focuses on alleviating the cross-domain and within-domain data sparsity and data imbalance and learns transferable latent representations for users, items and their interactions.
We present MMKG, a collection of three knowledge graphs that contain both numerical features and (links to) images for all entities as well as entity alignments between pairs of KGs.
Complex motion patterns of natural systems, such as fish schools, bird flocks, and cell groups, have attracted great attention from scientists for years.
The prior knowledge, i. e., the basis matrix and the CS-sampled signals, are used as the input and the target of the network; the basis coefficient matrix is embedded as the parameters of a certain layer; the objective function of conventional compressive sensing is set as the loss function of the network.
There are great interests as well as many challenges in applying reinforcement learning (RL) to recommendation systems.
In contrast to struggling on multimodal feature fusion, in this paper, we propose to unify all the input information by natural language so as to convert VQA into a machine reading comprehension problem.
We develop a novel image fusion framework based on MDLatLRR, which is used to decompose source images into detail parts(salient features) and base parts.
Recognizing irregular text in natural scene images is challenging due to the large variance in text appearance, such as curvature, orientation and distortion.
Ranked #13 on Scene Text Recognition on ICDAR2015
Feature extraction and processing tasks play a key role in Image Fusion, and the fusion performance is directly affected by the different features and processing methods undertaken.
A universal rule-based self-learning approach using deep reinforcement learning (DRL) is proposed for the first time to solve nonlinear ordinary differential equations and partial differential equations.
Multi-focus noisy image fusion represents an important task in the field of image fusion which generates a single, clear and focused image from all source images.
In this paper, we propose a novel multi-focus image fusion method based on dictionary learning and LRR to get a better performance in both global and local structure.
In order to promote the study on this problem while implementing the concurrent real-world image denoising datasets, we construct a new benchmark dataset which contains comprehensive real-world noisy images of different natural scenes.
no code implementations • 10 Feb 2018 • Tao Tan, Zhang Li, Haixia Liu, Ping Liu, Wenfang Tang, Hui Li, Yue Sun, Yusheng Yan, Keyu Li, Tao Xu, Shanshan Wan, Ke Lou, Jun Xu, Huiming Ying, Quchang Ouyang, Yuling Tang, Zheyu Hu, Qiang Li
To help doctors to be more selective on biopsies and provide a second opinion on diagnosis, in this work, we propose a computer-aided diagnosis (CAD) system for lung diseases including cancers and tuberculosis (TB).
This is exploited in sentiment analysis where machine learning models are used to predict the review score from the text of the review.
In contrast to existing approaches which take license plate detection and recognition as two separate tasks and settle them step by step, our method jointly solves these two tasks by a single network.
Compared with other CMOEAs, the proposed PPS method can more efficiently get across infeasible regions and converge to the feasible and non-dominated regions by applying push and pull search strategies at different stages.
Bilinear models belong to the most basic models for this task, they are comparably efficient to train and use, and they can provide good prediction performance.
In this work, we jointly address the problem of text detection and recognition in natural scene images based on convolutional recurrent neural networks.
The focus in this paper is Bayesian system identification based on noisy incomplete modal data where we can impose spatially-sparse stiffness changes when updating a structural model.
Multi-objective evolutionary algorithms (MOEAs) have progressed significantly in recent decades, but most of them are designed to solve unconstrained multi-objective optimization problems.
In this paper, we proposed an improved method, which eliminates the system calibration and determination in Zhang's method, meanwhile does not need to use the low frequency fringe pattern.
We proposed a method for enhanced high dynamic range 3D shape measurement based on generalized phase-shifting algorithm, which combines the complementary technique of inverted and regular fringe patterns with generalized phase-shifting algorithm.
Inspired by the success of deep neural networks (DNNs) in various vision applications, here we leverage DNNs to learn high-level features in a cascade framework, which lead to improved performance on both detection and recognition.
We define the task of salient structure (SS) detection to unify the saliency-related tasks like fixation prediction, salient object detection, and other detection of structures of interest.
The application of compressive sensing (CS) to structural health monitoring is an emerging research topic.
Our method is also much faster and more scalable than standard interior-point SDP solvers based WLDA.
The LCD similarity measure can be kernelized under KCRC, which theoretically links CRC and LCD under the kernel method.
Image enhancement is an important image processing technique that processes images suitably for a specific application e. g. image editing.
Each of the clusters is characterized by a few well-developed ERs that are partially or fully co-aligned in magnetic axis orientation.
Solar and Stellar Astrophysics