This work targets a novel text-driven whole-body motion generation task, which takes a given textual description as input and aims at generating high-quality, diverse, and coherent facial expressions, hand gestures, and body motions simultaneously.
In this paper, we present TOSS, which introduces text to the task of novel view synthesis (NVS) from just a single RGB image.
One of the most important tasks in quantitative investment research is mining new alphas (effective trading signals or factors).
The SOTA face swap models still suffer the problem of either target identity (i. e., shape) being leaked or the target non-identity attributes (i. e., background, hair) failing to be fully preserved in the final results.
Although large language models (LLMs) have achieved significant success in various tasks, they often struggle with hallucination problems, especially in scenarios requiring deep and responsible reasoning.
Our method is arguably the first to demonstrate that a concatenation of multiple convolution sparse coding/decoding layers leads to an interpretable and effective autoencoder for modeling the distribution of large-scale natural image datasets.
Quant has become one of the mainstream investment methodologies over the past decades, and has experienced three generations: Quant 1. 0, trading by mathematical modeling to discover mis-priced assets in markets; Quant 2. 0, shifting quant research pipeline from small ``strategy workshops'' to large ``alpha factories''; Quant 3. 0, applying deep learning techniques to discover complex nonlinear pricing rules.
We present a novel one-shot talking head synthesis method that achieves disentangled and fine-grained control over lip motion, eye gaze&blink, head pose, and emotional expression.
We introduce a novel detail manifolds reconstructor to learn 3D-consistent fine details on the radiance manifolds from monocular images, and combine them with the coarse radiance manifolds for high-fidelity reconstruction.
We present HandAvatar, a novel representation for hand animation and rendering, which can generate smoothly compositional geometry and self-occlusion-aware texture.
This paper revisits the principle of uniform convergence in statistical learning, discusses how it acts as the foundation behind machine learning, and attempts to gain a better understanding of the essential problem that current deep learning algorithms are solving.
Ten years into the revival of deep networks and artificial intelligence, we propose a theoretical framework that sheds light on understanding deep networks within a bigger picture of Intelligence in general.
In this paper we present Mask DINO, a unified object detection and segmentation framework.
Ranked #1 on Panoptic Segmentation on COCO test-dev (using extra training data)
We propose an effective semi-supervised method for learning 3D segmentations from a few labeled 3D shapes and a large amount of unlabeled 3D data.
no code implementations • 18 Mar 2022 • Sheng Yu, Zheng Yuan, Jun Xia, Shengxuan Luo, Huaiyuan Ying, Sihang Zeng, Jingyi Ren, Hongyi Yuan, Zhengyun Zhao, Yucong Lin, Keming Lu, Jing Wang, Yutao Xie, Heung-Yeung Shum
For decades, these knowledge graphs have been developed via expert curation; however, this method can no longer keep up with today's AI development, and a transition to algorithmically generated BioMedKGs is necessary.
Compared to other models on the leaderboard, DINO significantly reduces its model size and pre-training data size while achieving better results.
Ranked #1 on Real-Time Object Detection on COCO 2017 val
This article provides an overview of this progress and discusses related methods and technologies that can be incorporated for building robust conversational AI systems.