This sampling strategy for flow step can be easily applied to existing flow matching based models without retraining.
Video generation requires modeling a vast spatiotemporal space, which demands significant computational resources and data usage.
Motivated by this paradigm shift, we introduce DIAMOND (DIffusion As a Model Of eNvironment Dreams), a reinforcement learning agent trained in a diffusion world model.
Recent studies have shown that the denoising process in (generative) diffusion models can induce meaningful (discriminative) representations inside the model, though the quality of these representations still lags behind those learned through recent self-supervised learning methods.
Ranked #1 on Image Generation on ImageNet 256x256
Retrieval-Augmented Generation (RAG) systems enhance large language models (LLMs) by integrating external knowledge sources, enabling more accurate and contextually relevant responses tailored to user needs.
The salient multimodal capabilities and interactive experience of GPT-4o highlight its critical role in practical applications, yet it lacks a high-performing open-source counterpart.
We introduce MLE-bench, a benchmark for measuring how well AI agents perform at machine learning engineering.
We present a foundation model for zero-shot metric monocular depth estimation.
In this paper, we propose Generalizable and Animatable Gaussian head Avatar (GAGAvatar) for one-shot animatable head avatar reconstruction.