We build our model based on the latest Llama-3. 1-8B-Instruct model.
We propose Pure and Lightning ID customization (PuLID), a novel tuning-free ID customization method for text-to-image generation.
We present PaperQA, a RAG agent for answering questions over the scientific literature.
To demonstrate Windows Agent Arena's capabilities, we also introduce a new multi-modal agent, Navi.
We also introduce the VoiceAssistant-400K dataset to fine-tune models optimized for speech output.
Despite the potential of language model-based agents to solve real-world tasks such as web navigation, current methods still struggle with long-horizon tasks with complex action trajectories.
Despite having tremendous progress in image-to-3D generation, existing methods still struggle to produce multi-view consistent images with high-resolution textures in detail, especially in the paradigm of 2D diffusion that lacks 3D awareness.
We hope that our study can facilitate the research community and LLM vendors in promoting safer and regulated LLMs.
Monotonic alignment search (MAS), introduced by Glow-TTS, is one of the most popular algorithm in TTS to estimate unknown alignments between text and speech.