We propose a soft-label sorting network along with the counting network, which sorts the given images by their crowd numbers.
Previous video object segmentation approaches mainly focus on using simplex solutions between appearance and motion, limiting feature collaboration efficiency among and across these two cues.
Owning to the unremitting efforts by a few institutes, significant progress has recently been made in designing superhuman AIs in No-limit Texas Hold'em (NLTH), the primary testbed for large-scale imperfect-information game research.
Though remarkable progress has been achieved, we observe that the closer the pixel is to the edge, the more difficult it is to be predicted, because edge pixels have a very imbalance distribution.
Ranked #1 on Salient Object Detection on DUTS-TE (MAE metric)
To minimize the dependence on a large annotated dataset, our proposed semi-supervised method trains from a small number of labeled examples and exploits two regulatory signals from unlabeled data.
From a single viewpoint, we use a set of photometric stereo images to identify surface points with the same distance to the camera.
Deep neural networks have been shown to suffer from poor generalization when small perturbations are added (like Gaussian noise), yet little work has been done to evaluate their robustness to more natural image transformations like photo filters.
Motivated by the logical interrelations between binary segmentation and edge maps, we propose a novel Stacked Cross Refinement Network (SCRN) for salient object detection in this paper.
Ranked #4 on RGB Salient Object Detection on SOC
Most work on automated deception detection (ADD) in video has two restrictions: (i) it focuses on a video of one person, and (ii) it focuses on a single act of deception in a one or two minute video.
In this paper, we propose a novel Cascaded Partial Decoder (CPD) framework for fast and accurate salient object detection.
Ranked #1 on RGB Salient Object Detection on ISTD
Interestingly, we observe that after dropping 30% of the annotations (and labeling them as background), the performance of CNN-based object detectors like Faster-RCNN only drops by 5% on the PASCAL VOC dataset.
We present an image-based VIirtual Try-On Network (VITON) without using 3D information in any form, which seamlessly transfers a desired clothing item onto the corresponding region of a person using a coarse-to-fine strategy.
Recent progress on photometric stereo extends the technique to deal with general materials and unknown illumination conditions.
Given a text description of an event, event retrieval is performed by selecting concepts linguistically related to the event description and fusing the concept responses on unseen videos.
We present a method to capture both 3D shape and spatially varying reflectance with a multi-view photometric stereo technique that works for general isotropic materials.
Under unknown directional lighting, the uncalibrated Lambertian photometric stereo algorithm recovers the shape of a smooth surface up to the generalized bas-relief (GBR) ambiguity.