Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model

stepfun-ai/step-video-t2v 14 Feb 2025

We present Step-Video-T2V, a state-of-the-art text-to-video pre-trained model with 30B parameters and the ability to generate videos up to 204 frames in length.

Video Generation Video Reconstruction

1,407
6.58 stars / hour

OmniParser for Pure Vision Based GUI Agent

microsoft/omniparser 1 Aug 2024

The recent success of large vision language models shows great potential in driving the agent system operating on user interfaces.

Natural Language Visual Grounding

13,454
4.87 stars / hour

PIKE-RAG: sPecIalized KnowledgE and Rationale Augmented Generation

microsoft/pike-rag 20 Jan 2025

Despite notable advancements in Retrieval-Augmented Generation (RAG) systems that expand large language model (LLM) capabilities through external retrieval, these systems often struggle to meet the complex and diverse needs of real-world industrial applications.

Language Modeling Language Modelling +3

538
2.35 stars / hour

CodeI/O: Condensing Reasoning Patterns via Code Input-Output Prediction

hkust-nlp/codeio 11 Feb 2025

Reasoning is a fundamental capability of Large Language Models.

Code Generation Math

374
1.97 stars / hour

Data Formulator 2: Iteratively Creating Rich Visualizations with AI

microsoft/data-formulator 28 Aug 2024

To create rich visualizations, data analysts often need to iterate back and forth among data processing and chart specification to achieve their goals.

Code Generation Navigate

7,623
1.77 stars / hour

Magic 1-For-1: Generating One Minute Video Clips within One Minute

da-group-pku/magic-1-for-1 11 Feb 2025

The key idea is simple: factorize the text-to-video generation task into two separate easier tasks for diffusion step distillation, namely text-to-image generation and image-to-video generation.

Image to Video Generation Text-to-Image Generation +1

532
1.22 stars / hour

Light-A-Video: Training-free Video Relighting via Progressive Light Fusion

bcmi/Light-A-Video 12 Feb 2025

Second, leveraging the physical principle of light transport independence, we apply linear blending between the source video's appearance and the relighted appearance, using a Progressive Light Fusion (PLF) strategy to ensure smooth temporal transitions in illumination.

Image Relighting

284
1.10 stars / hour

KET-RAG: A Cost-Efficient Multi-Granular Indexing Framework for Graph-RAG

waetr/KET-RAG 13 Feb 2025

To ensure a good result accuracy while reducing the indexing cost, we propose KET-RAG, a multi-granular indexing framework.

Knowledge Graphs Question Answering +3

64
0.96 stars / hour

Flaming-hot Initiation with Regular Execution Sampling for Large Language Models

volcengine/verl 28 Oct 2024

Since the release of ChatGPT, large language models (LLMs) have demonstrated remarkable capabilities across various domains.

Diversity Math

3,428
0.95 stars / hour

Diffusion Models without Classifier-free Guidance

tzco/Diffusion-wo-CFG 17 Feb 2025

This paper presents Model-guidance (MG), a novel objective for training diffusion model that addresses and removes of the commonly used Classifier-free guidance (CFG).

Conditional Image Generation

33
0.88 stars / hour