LLaMA-Omni: Seamless Speech Interaction with Large Language Models

ictnlp/llama-omni 10 Sep 2024

We build our model based on the latest Llama-3. 1-8B-Instruct model.

1,580
2.87 stars / hour

PuLID: Pure and Lightning ID Customization via Contrastive Alignment

tothebeginning/pulid 24 Apr 2024

We propose Pure and Lightning ID customization (PuLID), a novel tuning-free ID customization method for text-to-image generation.

Text-to-Image Generation

1,865
1.81 stars / hour

PaperQA: Retrieval-Augmented Generative Agent for Scientific Research

whitead/paper-qa 8 Dec 2023

We present PaperQA, a RAG agent for answering questions over the scientific literature.

Information Retrieval Question Answering +2

5,479
1.39 stars / hour

Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale

microsoft/windowsagentarena 12 Sep 2024

To demonstrate Windows Agent Arena's capabilities, we also introduce a new multi-modal agent, Navi.

159
1.21 stars / hour

optillm

codelion/optillm 5 Sep 2024

Optimizing inference proxy for LLMs

Code Generation Diversity

666
1.12 stars / hour

Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming

gpt-omni/mini-omni 29 Aug 2024

We also introduce the VoiceAssistant-400K dataset to fine-tune models optimized for speech output.

Speech Synthesis

2,507
1.01 stars / hour

Agent Workflow Memory

zorazrw/agent-workflow-memory 11 Sep 2024

Despite the potential of language model-based agents to solve real-world tasks such as web navigation, current methods still struggle with long-horizon tasks with complex action trajectories.

AI Agent Language Modelling

140
0.96 stars / hour

Hi3D: Pursuing High-Resolution Image-to-3D Generation with Video Diffusion Models

yanghb22-fdu/hi3d-official 11 Sep 2024

Despite having tremendous progress in image-to-3D generation, existing methods still struggle to produce multi-view consistent images with high-resolution textures in detail, especially in the paradigm of 2D diffusion that lacks 3D awareness.

3D Generation 3D Reconstruction +3

139
0.93 stars / hour

"Do Anything Now": Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models

verazuo/jailbreak_llms 7 Aug 2023

We hope that our study can facilitate the research community and LLM vendors in promoting safer and regulated LLMs.

Community Detection

2,093
0.71 stars / hour

Super Monotonic Alignment Search

supertone-inc/super-monotonic-align 12 Sep 2024

Monotonic alignment search (MAS), introduced by Glow-TTS, is one of the most popular algorithm in TTS to estimate unknown alignments between text and speech.

107
0.69 stars / hour