Packing Input Frame Context in Next-Frame Prediction Models for Video Generation

lllyasviel/framepack 17 Apr 2025

We present a neural network structure, FramePack, to train next-frame (or next-frame-section) prediction models for video generation.

Video Generation

9,894
7.68 stars / hour

InstantCharacter: Personalize Any Characters with a Scalable Diffusion Transformer Framework

tencent/instantcharacter 16 Apr 2025

Third, to effectively train the framework, we construct a large-scale character dataset containing 10-million-level samples.

Image Generation

699
2.10 stars / hour

Reinforcement Learning from Human Feedback

natolambert/rlhf-book 16 Apr 2025

Reinforcement learning from human feedback (RLHF) has become an important technical and storytelling tool to deploy the latest machine learning systems.

Math Philosophy +2

811
2.07 stars / hour

UniAnimate-DiT: Human Image Animation with Large-Scale Video Diffusion Transformer

ali-vilab/unianimate-dit 15 Apr 2025

Furthermore, we adopt a simple concatenation operation to integrate the reference appearance into the model and incorporate the pose information of the reference image for enhanced pose alignment.

Image Animation

431
1.73 stars / hour

PRIMA.CPP: Speeding Up 70B-Scale LLM Inference on Low-Resource Everyday Home Clusters

lizonghang/prima.cpp 7 Apr 2025

Emergency of DeepSeek R1 and QwQ 32B have broken through performance barriers for running frontier large language models (LLMs) on home devices.

Quantization

688
1.49 stars / hour

IndexTTS: An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System

index-tts/index-tts 8 Feb 2025

Recently, large language model (LLM) based text-to-speech (TTS) systems have gradually become the mainstream in the industry due to their high naturalness and powerful zero-shot voice cloning capabilities. Here, we introduce the IndexTTS system, which is mainly based on the XTTS and Tortoise model.

Decoder Language Modeling +5

1,383
1.25 stars / hour

Bitnet.cpp: Efficient Edge Inference for Ternary LLMs

microsoft/bitnet 17 Feb 2025

The advent of 1-bit large language models (LLMs), led by BitNet b1. 58, has spurred interest in ternary LLMs.

17,236
1.20 stars / hour

LettuceDetect: A Hallucination Detection Framework for RAG Applications

KRLabsOrg/LettuceDetect 24 Feb 2025

Retrieval Augmented Generation (RAG) systems remain vulnerable to hallucinated answers despite incorporating external knowledge sources.

8k Hallucination +3

388
1.19 stars / hour

BIP3D: Bridging 2D Images and 3D Perception for Embodied Intelligence

HorizonRobotics/BIP3D 22 Nov 2024

In embodied intelligence systems, a key component is 3D perception algorithm, which enables agents to understand their surrounding environments.

3D visual grounding

179
0.99 stars / hour

Advanced Video Inpainting Using Optical Flow-Guided Efficient Diffusion

nevsnev/fgdvi 1 Dec 2024

Specifically, FloED employs a dual-branch architecture, where a flow branch first restores corrupted flow and a multi-scale flow adapter provides motion guidance to the main inpainting branch.

Denoising Optical Flow Estimation +1

269
0.89 stars / hour