To associate the predictions of disentangled decoders, we first generate a unified representation for HOI triplets with a base decoder, and then utilize it as input feature of each disentangled decoder.
In this paper, we build a novel framework named GaTector to tackle the gaze object prediction problem in a unified way.
We next combine the target pose image and the textures into a combined feature image, which is transformed into the output color image using a neural image translation network.
Based on those constraints, a category-consistent and relativistic diverse conditional GAN (CRD-CGAN) is proposed to synthesize $K$ photo-realistic images simultaneously.
Recent single-view 3D reconstruction methods reconstruct object's shape and texture from a single image with only 2D image-level annotation.
Therefore, in this paper, a novel framework was proposed to explore the impact of under-reporting on COVID-19 spatiotemporal distributions, and empirical analysis was carried out using infection data of healthcare workers in Wuhan and Hubei (excluding Wuhan).
In conditional Generative Adversarial Networks (cGANs), when two different initial noises are concatenated with the same conditional information, the distance between their outputs is relatively smaller, which makes minor modes likely to collapse into large modes.
Recent works in multiple object tracking use sequence model to calculate the similarity score between the detections and the previous tracklets.
We formulate the regularization term as a consistency loss that encourages geometric consistency among multiple views, while the data term guarantees that the optimized views do not drift away too much from a learned shape descriptor.
To solve the problem of inaccurate recognition of types of communication signal modulation, a RNN neural network recognition algorithm combining residual block network with attention mechanism is proposed.
Our tracker achieves leading performance in OTB2013, OTB2015, VOT2015, VOT2016 and LaSOT, and operates at a real-time speed of 26 FPS, which indicates our method is effective and practical.
In this paper, we propose a novel way to interpret text information by extracting visual feature presentation from multiple high-resolution and photo-realistic synthetic images generated by Text-to-image Generative Adversarial Network (GAN) to improve the performance of image labeling.
Few-shot learning is a nascent research topic, motivated by the fact that traditional deep learning methods require tremen- dous amounts of data.
Ranked #1 on Few-Shot Semantic Segmentation on Pascal5i
Different from image-to-image translation network that completes each view separately, our novel network, multi-view completion net (MVCN), leverages information from all views of a 3D shape to help the completion of each single view.
Specifically, for each training image, we first generate attention maps to represent the object's discriminative parts by weakly supervised learning.
Ranked #9 on Fine-Grained Image Classification on CUB-200-2011
Besides, we propose attention regularization and attention dropout to weakly supervise the generating process of attention maps.
Only one self-iterative regressor is trained to learn the descent directions for samples from coarse stages to fine stages, and parameters are iteratively updated by the same regressor.
Ranked #13 on Face Alignment on 300W (NME_inter-pupil (%, Common) metric)
Such networks learn the principal subspace, in the sense of principal component analysis (PCA), by adjusting synaptic weights according to activity-dependent learning rules.
Here, to overcome this problem, we derive sparse dictionary learning from a novel cost-function - a regularized error of the symmetric factorization of the input's similarity matrix.
We are mainly concerned with two theoretical questions: 1) under what conditions does RMTL perform better with a smaller task sample size than STL?
Here we propose to view a neuron as a signal processing device that represents the incoming streaming data matrix as a sparse vector of synaptic weights scaled by an outgoing sparse activity vector.
However, feedback inhibitory circuits are common in early sensory circuits and furthermore their dynamics may be nonlinear.