We propose Magic Clothing, a latent diffusion model (LDM)-based network architecture for an unexplored garment-driven image synthesis task.
This paper presents a Light CNN framework to learn a compact embedding on the large-scale face data with massive noisy labels.
Ranked #2 on Age-Invariant Face Recognition on CAFR
Autonomous robotic systems capable of learning novel manipulation tasks are poised to transform industries from manufacturing to service automation.
In this study, we propose AniPortrait, a novel framework for generating high-quality animation driven by audio and a reference portrait image.
Language agents show potential in being capable of utilizing natural language for varied and intricate tasks in diverse environments, particularly when built upon large language models (LLMs).
Tuning-free diffusion-based models have demonstrated significant potential in the realm of image personalization and customization.
In this study, we introduce a methodology for human image animation by leveraging a 3D human parametric model within a latent diffusion framework to enhance shape alignment and motion guidance in curernt human generative techniques.
We present a versatile NeRF-based simulator for testing autonomous driving (AD) software systems, designed with a focus on sensor-realistic closed-loop evaluation and the creation of safety-critical scenarios.
To this end, we introduce ViTamin, a new vision models tailored for VLMs.
We hope that our work will inspire future research into what can be achieved by versatile multimodal foundation models in real-world applications.