In this paper, we propose an efficient yet effective framework, HumanGaussian, that generates high-quality 3D humans with fine-grained geometry and realistic appearance.
In this work, we present TapMo, a Text-driven Animation Pipeline for synthesizing Motion in a broad spectrum of skeleton-free 3D characters.
However, none of the prior works focus on modeling 3D biped cartoon characters, which are also in great demand in gaming and filming.
A practical recognition system must balance between majority (head) and minority (tail) classes, generalize across the distribution, and acknowledge novelty upon the instances of unseen classes (open classes).
We present Masked Frequency Modeling (MFM), a unified frequency-domain-based approach for self-supervised pre-training of visual models.
Extensive experiments on COCO show that ORL significantly improves the performance of self-supervised learning on scene images, even surpassing supervised ImageNet pre-training on several downstream tasks.
Unlike most recent methods that focused on improving accuracy of image classification, we present a novel contrastive learning approach, named DetCo, which fully explores the contrasts between global image and local image patches to learn discriminative representations for object detection.
Recent unsupervised contrastive representation learning follows a Single Instance Multi-view (SIM) paradigm where positive pairs are usually constructed with intra-image data augmentation.
In this work, we present a comprehensive empirical study to better understand the role of inter-image invariance learning from three main constituting components: pseudo-label maintenance, sampling strategy, and decision boundary design.
This is achieved via Partial Completion Network (PCNet)-mask (M) and -content (C), that learn to recover fractions of object masks and contents, respectively, in a self-supervised manner.
With the vertex confidence and edge connectivity, we can naturally organize more relevant vertices on the affinity graph and group them into clusters.
Learning a good image prior is a long-term goal for image restoration and manipulation.
A typical domain adaptation approach is to adapt models trained on the annotated data in a source domain (e. g., sunny weather) for achieving high performance on the test data in a target domain (e. g., rainy weather).
Unlike existing works that design normalization techniques for specific tasks, we propose Switchable Whitening (SW), which provides a general form unifying different whitening methods as well as standardization methods.
Ranked #6 on Robust Object Detection on DWD
We define Open Long-Tailed Recognition (OLTR) as learning from such naturally distributed data and optimizing the classification accuracy over a balanced test set which include head, tail, and open classes.
Face recognition sees remarkable progress in recent years, and its performance has reached a very high level.
Instead of explicitly modeling the motion probabilities, we design the pretext task as a conditional motion propagation problem.
Face recognition has witnessed great progress in recent years, mainly attributed to the high-capacity model designed and the abundant labeled data collected.
The key of this new form of learning is to design a proxy task (e. g. image colorization), from which a discriminative loss can be formulated on unlabeled data.