Large Language Models (LLMs) are conversational interfaces.
We introduce Zep, a novel memory layer service for AI agents that outperforms the current state-of-the-art system, MemGPT, in the Deep Memory Retrieval (DMR) benchmark.
Computer use agents automate digital tasks by directly interacting with graphical user interfaces (GUIs) on computers and mobile devices, offering significant potential to enhance human productivity by completing an open-ended space of user queries.
In this paper, we introduce Latent Bridge Matching (LBM), a new, versatile and scalable method that relies on Bridge Matching in a latent space to achieve fast image-to-image translation.
Recent advances in video generation have been driven by diffusion models and autoregressive frameworks, yet critical challenges persist in harmonizing prompt adherence, visual quality, motion dynamics, and duration: compromises in motion dynamics to enhance temporal visual quality, constrained video duration (5-10 seconds) to prioritize resolution, and inadequate shot-aware generation stemming from general-purpose MLLMs' inability to interpret cinematic grammar, such as shot composition, actor expressions, and camera motions.
Large reasoning models (LRMs), such as OpenAI-o1 and DeepSeek-R1, demonstrate impressive long-horizon reasoning capabilities.
Large reasoning models (LRMs) already possess a latent capacity for long chain-of-thought reasoning.
We introduce a unified framework for training, sampling, and analyzing these models.
Ranked #1 on
Image Generation
on ImageNet 256x256
Molecular dynamics simulations demand an unprecedented combination of accuracy and scalability to tackle grand challenges in catalysis and materials design.
Text-to-audio systems, while increasingly performant, are slow at inference time, thus making their latency unpractical for many creative applications.