We present SadTalker, which generates 3D motion coefficients (head pose, expression) of the 3DMM from audio and implicitly modulates a novel 3D-aware face render for talking head generation.
We also have a better zero-shot shape-aware editing ability based on the text-to-video model.
Here we present $\Phi$-SO, a Physical Symbolic Optimization framework for recovering analytical symbolic expressions from physics data using deep reinforcement learning techniques by learning units constraints.
We introduce GLM-130B, a bilingual (English and Chinese) pre-trained language model with 130 billion parameters.
Ranked #1 on Language Modelling on CLUE (CMRC2018)
On the SuperGlue benchmark, GPTs achieve comparable and sometimes better performance to similar-sized BERTs in supervised learning.
We report the development of GPT-4, a large-scale, multimodal model which can accept image and text inputs and produce text outputs.
Ranked #1 on Multi-task Language Understanding on MMLU
In this paper, we provide a comprehensive survey of learning-based camera calibration techniques, by analyzing their strengths and limitations.
The existing methods do not simultaneously satisfy the above two aspects of VC, and their conversion outputs suffer from a trade-off problem between maintaining source contents and target characteristics.
However, the problem of transferring these capabilities learned from image-level supervision to the pixel-level task of segmentation and addressing arbitrary unseen categories at inference makes this task challenging.
To that end, the segmentation mask is expressed with a special type of image (dubbed as maskige).