UniAnimate-DiT: Human Image Animation with Large-Scale Video Diffusion Transformer

ali-vilab/unianimate-dit 15 Apr 2025

Furthermore, we adopt a simple concatenation operation to integrate the reference appearance into the model and incorporate the pose information of the reference image for enhanced pose alignment.

Image Animation

244
1.50 stars / hour

PRIMA.CPP: Speeding Up 70B-Scale LLM Inference on Low-Resource Everyday Home Clusters

lizonghang/prima.cpp 7 Apr 2025

Emergency of DeepSeek R1 and QwQ 32B have broken through performance barriers for running frontier large language models (LLMs) on home devices.

Quantization

461
1.46 stars / hour

REPA-E: Unlocking VAE for End-to-End Tuning with Latent Diffusion Transformers

End2End-Diffusion/REPA-E 14 Apr 2025

We show that while diffusion loss is ineffective, end-to-end training can be unlocked through the representation-alignment (REPA) loss -- allowing both VAE and diffusion model to be jointly tuned during the training process.

114
1.00 stars / hour

Liquid: Language Models are Scalable Multi-modal Generators

foundationvision/liquid 5 Dec 2024

We present Liquid, an auto-regressive generation paradigm that seamlessly integrates visual comprehension and generation by tokenizing images into discrete codes and learning these code embeddings alongside text tokens within a shared feature space for both vision and language.

Language Modeling Language Modelling +2

518
0.99 stars / hour

Advanced Video Inpainting Using Optical Flow-Guided Efficient Diffusion

nevsnev/fgdvi 1 Dec 2024

Specifically, FloED employs a dual-branch architecture, where a flow branch first restores corrupted flow and a multi-scale flow adapter provides motion guidance to the main inpainting branch.

Denoising Optical Flow Estimation +1

217
0.98 stars / hour

Bitnet.cpp: Efficient Edge Inference for Ternary LLMs

microsoft/bitnet 17 Feb 2025

The advent of 1-bit large language models (LLMs), led by BitNet b1. 58, has spurred interest in ternary LLMs.

14,570
0.77 stars / hour

The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search

sakanaai/ai-scientist-v2 10 Apr 2025

AI is increasingly playing a pivotal role in transforming how scientific discoveries are made.

scientific discovery

661
0.74 stars / hour

UI-TARS: Pioneering Automated GUI Interaction with Native Agents

bytedance/ui-tars 21 Jan 2025

This paper introduces UI-TARS, a native GUI agent model that solely perceives the screenshots as input and performs human-like interactions (e. g., keyboard and mouse operations).

4,162
0.71 stars / hour

BIP3D: Bridging 2D Images and 3D Perception for Embodied Intelligence

HorizonRobotics/BIP3D 22 Nov 2024

In embodied intelligence systems, a key component is 3D perception algorithm, which enables agents to understand their surrounding environments.

3D visual grounding

168
0.68 stars / hour