We build our model based on the latest Llama-3. 1-8B-Instruct model.
In this work, we introduce OmniGen, a new diffusion model for unified image generation.
In this paper, we introduce the Kolmogorov-Arnold Transformer (KAT), a novel architecture that replaces MLP layers with Kolmogorov-Arnold Network (KAN) layers to enhance the expressiveness and performance of the model.
We propose Pure and Lightning ID customization (PuLID), a novel tuning-free ID customization method for text-to-image generation.
In many modern LLM applications, such as retrieval augmented generation, prompts have become programs themselves.
Our work examines the efficacy of employing advanced machine learning methods to solve captchas from Google's reCAPTCHAv2 system.
This single-image calibration can benefit various downstream applications like image editing and 3D mapping.
This paper introduces a novel and efficient One-pass Generation and retrieval framework (OneGen), designed to improve LLMs' performance on tasks that require both generation and retrieval.
Initially, image and text inputs are aligned with visual learners operating alongside the main attention, balancing focus on visual elements.