Recent large language models (LLMs) in the general domain, such as ChatGPT, have shown remarkable success in following instructions and producing human-like responses.
We propose PAniC-3D, a system to reconstruct stylized 3D character heads directly from illustrated (p)ortraits of (ani)me (c)haracters.
Recent text-to-video generation approaches rely on computationally heavy training and require large-scale video datasets.
In this work, we investigate the problem of creating high-fidelity 3D content from only a single image.
To replicate the success of text-to-image (T2I) generation, recent works employ large-scale video datasets to train a text-to-video (T2V) generator.
We propose SmoothQuant, a training-free, accuracy-preserving, and general-purpose post-training quantization (PTQ) solution to enable 8-bit weight, 8-bit activation (W8A8) quantization for LLMs.
Key to Fantasia3D is the disentangled modeling and learning of geometry and appearance.
However current research rarely studies the impact of different amounts of instruction data on model performance, especially in the real-world use cases.
We propose Low-Rank Adaptation, or LoRA, which freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture, greatly reducing the number of trainable parameters for downstream tasks.
In such attacks, an adversary can prompt the LLM to produce malicious content or override the original instructions and the employed filtering schemes.