In this paper, we fully utilize the facial part segmentation geometry by introducing Part Re-projection Distance Loss (PRDL).
Ranked #3 on 3D Face Reconstruction on REALY (side-view)
Image diffusion models have been utilized in various tasks, such as text-to-image generation and controllable image synthesis.
One puzzling artifact in machine learning dubbed grokking is where delayed generalization is achieved tenfolds of iterations after near perfect overfitting to the training data.
Our experiments show that our proposed MatMul-free models achieve performance on-par with state-of-the-art Transformers that require far more memory during inference at a scale up to at least 2. 7B parameters.
In our GNN-RAG framework, the GNN acts as a dense subgraph reasoner to extract useful graph information, while the LLM leverages its natural language processing ability for ultimate KGQA.
Diffusion Transformers (DiTs) introduce the transformer architecture to diffusion tasks for latent-space image generation.
We present a novel generative 3D modeling system, coined CraftsMan, which can generate high-fidelity 3D geometries with highly varied shapes, regular mesh topologies, and detailed surfaces, and, notably, allows for refining the geometry in an interactive manner.
Recent works in implicit representations, such as Neural Radiance Fields (NeRF), have advanced the generation of realistic and animatable head avatars from video sequences.
For data scaling, we introduce a Warmup-Stable-Decay (WSD) learning rate scheduler (LRS), conducive to continuous training and domain adaptation.
For fine-grained language understanding, we train a Multimodal Large Language Model to refine the captions of the images.