In the realm of security applications, biometric authentication systems play a crucial role, yet one often encounters challenges concerning privacy and security while developing one.
The development of deep learning-based biometric models that can be deployed on devices with constrained memory and computational resources has proven to be a significant challenge.
Ranked #1 on Face Recognition on CFP-FF
In this study, we propose AniPortrait, a novel framework for generating high-quality animation driven by audio and a reference portrait image.
We investigate efficient methods for training Large Language Models (LLMs) to possess capabilities in multiple specialized domains, such as coding, math reasoning and world knowledge.
Ranked #30 on Question Answering on TriviaQA
To this end, we propose ControlNet++, a novel approach that improves controllable generation by explicitly optimizing pixel-level cycle consistency between generated images and conditional controls.
The backbone is trained end-to-end using a novel differentiable solver for wide-baseline two-view pose.
We propose Magic Clothing, a latent diffusion model (LDM)-based network architecture for an unexplored garment-driven image synthesis task.
Tuning-free diffusion-based models have demonstrated significant potential in the realm of image personalization and customization.
This paper presents the UniMER dataset to provide the first study on Mathematical Expression Recognition (MER) towards complex real-world scenarios.
We try to narrow the gap by mining the potential of VLMs for better performance and any-to-any workflow from three aspects, i. e., high-resolution visual tokens, high-quality data, and VLM-guided generation.
Ranked #9 on Visual Question Answering on MM-Vet