Recent deep generative models have achieved promising performance in image inpainting.
It is challenging to directly estimate the geometry of human from a single image due to the high diversity and complexity of body shapes with the various clothing styles.
Although enjoying the merits of promising results on the semantic parsing, deep learning methods cannot directly make use of the architectural rules, which play an important role for man-made structures.
Specifically, we suggest to employ a set of adaptive points to capture the geometric and spatial information of the arbitrary-oriented objects, which is able to automatically arrange themselves over the object in a spatial and semantic scenario.
In this paper, we propose to investigate the problem of out-of-domain visio-linguistic pretraining, where the pretraining data distribution differs from that of downstream data on which the pretrained model will be fine-tuned.
Click-through rate (CTR) prediction is an essential task in industrial applications such as video recommendation.
In this work, we propose a novel surgical perception framework, SuPer, for surgical robotic control.
Recent years have witnessed many exciting achievements for object detection using deep learning techniques.
In order to efficiently search in such a large 4-DoF space in real-time, we formulate the problem into two 2-DoF sub-problems and apply an efficient Block Coordinates Descent solver to optimize the estimation result.
In this paper, we propose a novel simple yet effective framework of "Feature Agglomeration Networks" (FANet) to build a new single stage face detector, which not only achieves state-of-the-art performance but also runs efficiently.
Generating high fidelity identity-preserving faces with different facial attributes has a wide range of applications.
Unlike the existing approaches, in this paper, we propose a novel approach called Sparse Product Quantization (SPQ) to encoding the high-dimensional feature vectors into sparse representation.
Most modern trackers typically employ a bounding box given in the first frame to track visual objects, where their tracking results are often sensitive to the initialization.
In this framework, SVM and TSVM can be regarded as a learning machine without regularization and one with full regularization from the unlabeled data, respectively.
Learning distance functions with side information plays a key role in many machine learning and data mining applications.
We consider the problem of Support Vector Machine transduction, which involves a combinatorial problem with exponential computational complexity in the number of unlabeled examples.