LLaMA-Omni: Seamless Speech Interaction with Large Language Models

ictnlp/llama-omni 10 Sep 2024

We build our model based on the latest Llama-3. 1-8B-Instruct model.

1,671
3.98 stars / hour

OmniGen: Unified Image Generation

vectorspacelab/omnigen 17 Sep 2024

In this work, we introduce OmniGen, a new diffusion model for unified image generation.

Edge Detection Pose Estimation +2

384
3.98 stars / hour

Kolmogorov-Arnold Transformer

Adamdad/kat 16 Sep 2024

In this paper, we introduce the Kolmogorov-Arnold Transformer (KAT), a novel architecture that replaces MLP layers with Kolmogorov-Arnold Network (KAN) layers to enhance the expressiveness and performance of the model.

259
3.83 stars / hour

optillm

codelion/optillm 8 Feb 2024

Optimizing inference proxy for LLMs

GSM8K In-Context Learning +3

666
3.43 stars / hour

PuLID: Pure and Lightning ID Customization via Contrastive Alignment

tothebeginning/pulid 24 Apr 2024

We propose Pure and Lightning ID customization (PuLID), a novel tuning-free ID customization method for text-to-image generation.

Text-to-Image Generation

1,912
2.08 stars / hour

Symbolic Prompt Program Search: A Structure-Aware Approach to Efficient Compile-Time Prompt Optimization

microsoft/sammo 2 Apr 2024

In many modern LLM applications, such as retrieval augmented generation, prompts have become programs themselves.

RAG Retrieval

525
1.56 stars / hour

Breaking reCAPTCHAv2

aplesner/Breaking-reCAPTCHAv2 13 Sep 2024

Our work examines the efficacy of employing advanced machine learning methods to solve captchas from Google's reCAPTCHAv2 system.

Image Segmentation Semantic Segmentation

87
0.92 stars / hour

GeoCalib: Learning Single-image Calibration with Geometric Optimization

cvg/geocalib 10 Sep 2024

This single-image calibration can benefit various downstream applications like image editing and 3D mapping.

3D geometry Visual Localization

206
0.79 stars / hour

OneGen: Efficient One-Pass Unified Generation and Retrieval for LLMs

zjunlp/onegen 8 Sep 2024

This paper introduces a novel and efficient One-pass Generation and retrieval framework (OneGen), designed to improve LLMs' performance on tasks that require both generation and retrieval.

Entity Linking RAG +1

109
0.75 stars / hour

Wings: Learning Multimodal LLMs without Text-only Forgetting

aidc-ai/ovis 5 Jun 2024

Initially, image and text inputs are aligned with visual learners operating alongside the main attention, balancing focus on visual elements.

Question Answering Visual Question Answering

218
0.71 stars / hour