To this end, we propose SwiftAvatar, a novel avatar auto-creation framework that is evidently superior to previous works.
Performance of trimap-free image matting methods is limited when trying to decouple the deterministic and undetermined regions, especially in the scenes where foregrounds are semantically ambiguous, chromaless, or high transmittance.
Besides, by leveraging full training set and the additional 48K raw images of KITTI, it can further improve the MonoFlex by +4. 65% improvement on AP@0. 7 for car detection, reaching 18. 54% AP@0. 7, which ranks the 1st place among all monocular based methods on KITTI test leaderboard.
To exploit this effect, the model prediction-based methods have been widely adopted, which aim to exploit the outputs of DNNs in the early stage of learning to correct noisy labels.
In order to solve this problem, we paraphrase the reference summaries in CLTS, the Chinese Long Text Summarization dataset, correct errors of factual inconsistencies, and propose the first Chinese Long Text Summarization dataset with a high level of abstractiveness, CLTS+, which contains more than 180K article-summary pairs and is available online.
Further, by defining a new form of data centroid, we transform the recovery problem of a label-dependent part to a centroid estimation problem.
In this paper, we succeed in introducing multi-scale representations into semantic segmentation ViT via window attention mechanism and further improves the performance and efficiency.
Ranked #3 on Semantic Segmentation on Cityscapes test
Much of that is due to the notorious modality bias training issue brought by the single-modality ImageNet pre-training, which might yield RGB-biased representations that severely hinder the cross-modality image retrieval.
However, existing methods mostly train the DNNs on uniformly sampled LR-HR patch pairs, which makes them fail to fully exploit informative patches within the image.
Internet video delivery has undergone a tremendous explosion of growth over the past few years.
It is also worth pointing that, given identical strong data augmentations, the performance improvement of ConTNet is more remarkable than that of ResNet.
Graph Reasoning has shown great potential recently in modeling long-range dependencies, which are crucial for various computer vision tasks.
GI unit is further improved by the SC-loss to enhance the semantic representations over the exemplar-based semantic graph.
To further promote the research of ship detection, we introduced a new fine-grained ship detection datasets, which is named as FGSD.
Human parsing is an essential branch of semantic segmentation, which is a fine-grained semantic segmentation task to identify the constituent parts of human.
Recent neural sequence to sequence models have provided feasible solutions for abstractive summarization.
Road extraction is a fundamental task in the field of remote sensing which has been a hot research topic in the past decade.