Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities

aidc-ai/awesome-unified-multimodal-models 5 May 2025

Despite their respective successes, these two domains have evolved independently, leading to distinct architectural paradigms: While autoregressive-based architectures have dominated multimodal understanding, diffusion-based models have become the cornerstone of image generation.

Survey Text-to-Image Generation

180
0.87 stars / hour

Attentive Reasoning Queries: A Systematic Method for Optimizing Instruction-Following in Large Language Models

emcie-co/parlant 5 Mar 2025

We present Attentive Reasoning Queries (ARQs), a novel structured reasoning approach that significantly improves instruction-following in Large Language Models through domain-specialized reasoning blueprints.

Hallucination Instruction Following +1

2,829
0.79 stars / hour

INTELLECT-2: A Reasoning Model Trained Through Globally Decentralized Reinforcement Learning

PrimeIntellect-ai/prime-rl 12 May 2025

We introduce INTELLECT-2, the first globally distributed reinforcement learning (RL) training run of a 32 billion parameter language model.

reinforcement-learning Reinforcement Learning +1

236
0.73 stars / hour

Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play

maitrix-org/voila 5 May 2025

A voice AI agent that blends seamlessly into daily life would interact with humans in an autonomous, real-time, and emotionally expressive manner.

AI Agent Automatic Speech Recognition +5

334
0.72 stars / hour

Bitnet.cpp: Efficient Edge Inference for Ternary LLMs

microsoft/bitnet 17 Feb 2025

The advent of 1-bit large language models (LLMs), led by BitNet b1. 58, has spurred interest in ternary LLMs.

19,403
0.68 stars / hour

Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory

mem0ai/mem0 28 Apr 2025

Large Language Models (LLMs) have demonstrated remarkable prowess in generating contextually coherent responses, yet their fixed context windows pose fundamental challenges for maintaining consistency over prolonged multi-session dialogues.

RAG

30,116
0.67 stars / hour

Deformable Beta Splatting

RongLiu-Leo/beta-splatting 27 Jan 2025

Experimental results demonstrate that DBS achieves state-of-the-art visual quality while utilizing only 45% of the parameters and rendering 1. 5x faster than 3DGS-MCMC, highlighting the superior performance of DBS for real-time radiance field rendering.

3DGS Novel View Synthesis

135
0.63 stars / hour

LBM: Latent Bridge Matching for Fast Image-to-Image Translation

gojasper/lbm 10 Mar 2025

In this paper, we introduce Latent Bridge Matching (LBM), a new, versatile and scalable method that relies on Bridge Matching in a latent space to achieve fast image-to-image translation.

Depth Estimation Image Relighting +2

456
0.61 stars / hour

Zep: A Temporal Knowledge Graph Architecture for Agent Memory

getzep/graphiti 20 Jan 2025

We introduce Zep, a novel memory layer service for AI agents that outperforms the current state-of-the-art system, MemGPT, in the Deep Memory Retrieval (DMR) benchmark.

RAG Retrieval

8,907
0.58 stars / hour

T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT

caraj7/t2i-r1 1 May 2025

By applying our reasoning strategies to the baseline model, Janus-Pro, we achieve superior performance with 13% improvement on T2I-CompBench and 19% improvement on the WISE benchmark, even surpassing the state-of-the-art model FLUX. 1.

Reinforcement Learning (RL) Text-to-Image Generation

276
0.57 stars / hour