Mora: Enabling Generalist Video Generation via A Multi-Agent Framework

lichao-sun/mora 20 Mar 2024

Sora is the first large-scale generalist video generation model that garnered significant attention across society.

Image to Video Generation Text-to-Video Generation +1

791
3.18 stars / hour

SDXS: Real-Time One-Step Latent Diffusion Models with Image Conditions

IDKiro/sdxs 25 Mar 2024

Recent advancements in diffusion models have positioned them at the forefront of image generation.

Image-to-Image Translation Knowledge Distillation

100
2.61 stars / hour

StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text

picsart-ai-research/streamingt2v 21 Mar 2024

To overcome these limitations, we introduce StreamingT2V, an autoregressive approach for long video generation of 80, 240, 600, 1200 or more frames with smooth transitions.

Text-to-Video Generation Video Generation

228
2.56 stars / hour

Evolutionary Optimization of Model Merging Recipes

sakanaai/evolutionary-model-merge 19 Mar 2024

Surprisingly, our Japanese Math LLM achieved state-of-the-art performance on a variety of established Japanese LLM benchmarks, even surpassing models with significantly more parameters, despite not being explicitly trained for such tasks.

Evolutionary Algorithms Math

744
2.39 stars / hour

T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy

idea-research/t-rex 21 Mar 2024

Recognizing the complementary strengths and weaknesses of both text and visual prompts, we introduce T-Rex2 that synergizes both prompts within a single model through contrastive learning.

Contrastive Learning Descriptive +3

1,001
1.78 stars / hour

FeatUp: A Model-Agnostic Framework for Features at Any Resolution

mhamilton723/FeatUp 15 Mar 2024

Deep features are a cornerstone of computer vision research, capturing image semantics and enabling the community to solve downstream tasks even in the zero- or few-shot regime.

Depth Estimation Depth Prediction +5

786
1.68 stars / hour

Analyzing and Improving the Training Dynamics of Diffusion Models

nvlabs/edm2 5 Dec 2023

Diffusion models currently dominate the field of data-driven image synthesis with their unparalleled scaling to large datasets.

Image Generation Philosophy

214
1.38 stars / hour

General Object Foundation Model for Images and Videos at Scale

FoundationVision/GLEE 14 Dec 2023

We present GLEE in this work, an object-level foundation model for locating and identifying objects in images and videos.

Long-tail Video Object Segmentation Multi-Object Tracking +8

539
1.33 stars / hour

One-Step Image Translation with Text-to-Image Models

gaparmar/img2img-turbo 18 Mar 2024

In this work, we address two limitations of existing conditional diffusion models: their slow inference speed due to the iterative denoising process and their reliance on paired data for model fine-tuning.

Denoising Translation

673
1.26 stars / hour

LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression

microsoft/LLMLingua 19 Mar 2024

The challenge is that information entropy may be a suboptimal compression metric: (i) it only leverages unidirectional context and may fail to capture all essential information needed for prompt compression; (ii) it is not aligned with the prompt compression objective.

GSM8K Language Modelling +3

3,349
1.15 stars / hour