F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching

SWivid/F5-TTS 9 Oct 2024

This sampling strategy for flow step can be easily applied to existing flow matching based models without retraining.

Denoising Text to Speech

2,642
15.00 stars / hour

Pyramidal Flow Matching for Efficient Video Generative Modeling

jy0205/Pyramid-Flow 8 Oct 2024

Video generation requires modeling a vast spatiotemporal space, which demands significant computational resources and data usage.

Text-to-Video Generation Video Generation

1,608
3.95 stars / hour

Diffusion for World Modeling: Visual Details Matter in Atari

eloialonso/diamond 20 May 2024

Motivated by this paradigm shift, we introduce DIAMOND (DIffusion As a Model Of eNvironment Dreams), a reinforcement learning agent trained in a diffusion world model.

Image Generation reinforcement-learning +1

1,071
3.87 stars / hour

Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think

sihyun-yu/REPA 9 Oct 2024

Recent studies have shown that the denoising process in (generative) diffusion models can induce meaningful (discriminative) representations inside the model, though the quality of these representations still lags behind those learned through recent self-supervised learning methods.

Denoising Image Generation +1

325
3.87 stars / hour

LightRAG: Simple and Fast Retrieval-Augmented Generation

hkuds/lightrag 8 Oct 2024

Retrieval-Augmented Generation (RAG) systems enhance large language models (LLMs) by integrating external knowledge sources, enabling more accurate and contextually relevant responses tailored to user needs.

Information Retrieval RAG +1

607
3.16 stars / hour

Baichuan-Omni Technical Report

westlake-baichuan-mllm/bc-omni 11 Oct 2024

The salient multimodal capabilities and interactive experience of GPT-4o highlight its critical role in practical applications, yet it lacks a high-performing open-source counterpart.

Language Modelling Large Language Model +1

114
2.98 stars / hour
464
2.58 stars / hour

MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering

openai/mle-bench 9 Oct 2024

We introduce MLE-bench, a benchmark for measuring how well AI agents perform at machine learning engineering.

379
2.22 stars / hour

Depth Pro: Sharp Monocular Metric Depth in Less Than a Second

apple/ml-depth-pro 2 Oct 2024

We present a foundation model for zero-shot metric monocular depth estimation.

Monocular Depth Estimation

2,982
1.29 stars / hour

Generalizable and Animatable Gaussian Head Avatar

xg-chu/gagavatar 10 Oct 2024

In this paper, we propose Generalizable and Animatable Gaussian head Avatar (GAGAvatar) for one-shot animatable head avatar reconstruction.

166
1.29 stars / hour