Pyramidal Flow Matching for Efficient Video Generative Modeling

jy0205/Pyramid-Flow 8 Oct 2024

Video generation requires modeling a vast spatiotemporal space, which demands significant computational resources and data usage.

Text-to-Video Generation Video Generation

1,608
6.28 stars / hour
464
3.31 stars / hour

MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering

openai/mle-bench 9 Oct 2024

We introduce MLE-bench, a benchmark for measuring how well AI agents perform at machine learning engineering.

379
2.98 stars / hour

Depth Pro: Sharp Monocular Metric Depth in Less Than a Second

apple/ml-depth-pro 2 Oct 2024

We present a foundation model for zero-shot metric monocular depth estimation.

Monocular Depth Estimation

2,982
2.04 stars / hour

Diffusion for World Modeling: Visual Details Matter in Atari

eloialonso/diamond 20 May 2024

Motivated by this paradigm shift, we introduce DIAMOND (DIffusion As a Model Of eNvironment Dreams), a reinforcement learning agent trained in a diffusion world model.

Image Generation reinforcement-learning +1

810
1.50 stars / hour

LightRAG: Simple and Fast Retrieval-Augmented Generation

hkuds/lightrag 8 Oct 2024

Retrieval-Augmented Generation (RAG) systems enhance large language models (LLMs) by integrating external knowledge sources, enabling more accurate and contextually relevant responses tailored to user needs.

Information Retrieval RAG +1

607
1.42 stars / hour

Posterior-Mean Rectified Flow: Towards Minimum MSE Photo-Realistic Image Restoration

ohayonguy/PMRF 1 Oct 2024

Photo-realistic image restoration algorithms are typically evaluated by distortion measures (e. g., PSNR, SSIM) and by perceptual quality measures (e. g., FID, NIQE), where the desire is to attain the lowest possible distortion without compromising on perceptual quality.

 Ranked #1 on Blind Face Restoration on CelebA-Test (FID metric)

Blind Face Restoration Image Colorization +5

414
1.03 stars / hour

Agent S: An Open Agentic Framework that Uses Computers Like a Human

simular-ai/agent-s 10 Oct 2024

We present Agent S, an open agentic framework that enables autonomous interaction with computers through a Graphical User Interface (GUI), aimed at transforming human-computer interaction by automating complex, multi-step tasks.

AI Agent

117
0.94 stars / hour

Scaling Proprioceptive-Visual Learning with Heterogeneous Pre-trained Transformers

liruiw/HPT 30 Sep 2024

Previous robot learning methods often collect data to train with one specific embodiment for one task, which is expensive and prone to overfitting.

286
0.82 stars / hour

Making Images Real Again: A Comprehensive Survey on Deep Image Composition

bcmi/libcom 28 Jun 2021

We have also contributed the first image composition toolbox: libcom https://github. com/bcmi/libcom, which assembles 10+ image composition related functions (e. g., image blending, image harmonization, object placement, shadow generation, generative composition).

Image Harmonization

457
0.80 stars / hour