Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory

mem0ai/mem0 28 Apr 2025

Large Language Models (LLMs) have demonstrated remarkable prowess in generating contextually coherent responses, yet their fixed context windows pose fundamental challenges for maintaining consistency over prolonged multi-session dialogues.

RAG Retrieval-augmented Generation

32,038
0.51 stars / hour

DeepEyes: Incentivizing "Thinking with Images" via Reinforcement Learning

visual-agent/deepeyes 20 May 2025

Large Vision-Language Models (VLMs) have shown strong capabilities in multimodal understanding and reasoning, yet they are primarily constrained by text-based reasoning processes.

Hallucination Mathematical Reasoning +4

17
0.50 stars / hour

Zep: A Temporal Knowledge Graph Architecture for Agent Memory

getzep/graphiti 20 Jan 2025

We introduce Zep, a novel memory layer service for AI agents that outperforms the current state-of-the-art system, MemGPT, in the Deep Memory Retrieval (DMR) benchmark.

RAG Retrieval +1

9,464
0.49 stars / hour

Generative AI for Autonomous Driving: Frontiers and Opportunities

taco-group/genai4ad 13 May 2025

This survey delivers a comprehensive and critical synthesis of the emerging role of GenAI across the autonomous driving stack.

Autonomous Driving Video Generation

62
0.49 stars / hour

Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis

tang-bd/fuse-dit 15 May 2025

This paper does not describe a new method; instead, it provides a thorough exploration of an important yet understudied design space related to recent advances in text-to-image synthesis -- specifically, the deep fusion of large language models (LLMs) and diffusion transformers (DiTs) for multi-modal generation.

Text-to-Image Generation

69
0.45 stars / hour

An End-to-End Robust Point Cloud Semantic Segmentation Network with Single-Step Conditional Diffusion Models

qwtforgithub/cdsegnet 25 Nov 2024

Moreover, thanks to CNF, CDSegNet can generate the semantic labels in a single-step inference like non-DDPMs, due to avoiding directly fitting the scores from semantic labels in the dominant network of CDSegNet.

Denoising Scene Understanding +1

90
0.45 stars / hour

Generating Physically Stable and Buildable LEGO Designs from Text

AvaLovelace1/LegoGPT 8 May 2025

Our experiments show that LegoGPT produces stable, diverse, and aesthetically pleasing LEGO designs that align closely with the input text prompts.

3D Generation Large Language Model +1

1,115
0.43 stars / hour

VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model

vita-mllm/vita-audio 6 May 2025

Specifically, we introduce a lightweight Multiple Cross-modal Token Prediction (MCTP) module that efficiently generates multiple audio tokens within a single model forward pass, which not only accelerates the inference but also significantly reduces the latency for generating the first audio in streaming scenarios.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +7

383
0.43 stars / hour

IndexTTS: An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System

index-tts/index-tts 8 Feb 2025

Recently, large language model (LLM) based text-to-speech (TTS) systems have gradually become the mainstream in the industry due to their high naturalness and powerful zero-shot voice cloning capabilities. Here, we introduce the IndexTTS system, which is mainly based on the XTTS and Tortoise model.

Decoder Language Modeling +6

1,841
0.41 stars / hour

Flow-GRPO: Training Flow Matching Models via Online RL

yifan123/flow_grpo 8 May 2025

We propose Flow-GRPO, the first method integrating online reinforcement learning (RL) into flow matching models.

Denoising Diversity +3

597
0.39 stars / hour