Less-to-More Generalization: Unlocking More Controllability by In-Context Generation

bytedance/uno 2 Apr 2025

In this study, we propose a highly-consistent data synthesis pipeline to tackle this challenge.

Conditional Image Generation Personalized Image Generation +1

762
2.10 stars / hour

Kimi-VL Technical Report

moonshotai/kimi-vl 10 Apr 2025

We present Kimi-VL, an efficient open-source Mixture-of-Experts (MoE) vision-language model (VLM) that offers advanced multimodal reasoning, long-context understanding, and strong agent capabilities - all while activating only 2. 8B parameters in its language decoder (Kimi-VL-A3B).

Long-Context Understanding Mathematical Reasoning +3

731
1.12 stars / hour

TorchFX: A modern approach to Audio DSP with PyTorch and GPU acceleration

matteospanio/torchfx 11 Apr 2025

In response, we introduce TorchFX: a GPU-accelerated Python library for DSP, specifically engineered to facilitate sophisticated audio signal processing.

Audio Signal Processing Benchmarking

58
1.01 stars / hour

REPA-E: Unlocking VAE for End-to-End Tuning of Latent Diffusion Transformers

End2End-Diffusion/REPA-E 15 Apr 2025

We show that while diffusion loss is ineffective, end-to-end training can be unlocked through the representation-alignment (REPA) loss -- allowing both VAE and diffusion model to be jointly tuned during the training process.

Image Generation

67
0.96 stars / hour

SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models

mit-han-lab/nunchaku 7 Nov 2024

To address this, we co-design an inference engine Nunchaku that fuses the kernels of the low-rank branch into those of the low-bit branch to cut off redundant memory access.

Quantization

1,409
0.94 stars / hour

VGGT: Visual Geometry Grounded Transformer

facebookresearch/vggt 14 Mar 2025

We present VGGT, a feed-forward neural network that directly infers all key 3D attributes of a scene, including camera parameters, point maps, depth maps, and 3D point tracks, from one, a few, or hundreds of its views.

Depth Estimation Novel View Synthesis +3

4,951
0.89 stars / hour

PixelFlow: Pixel-Space Generative Models with Flow

shoufachen/pixelflow 10 Apr 2025

We present PixelFlow, a family of image generation models that operate directly in the raw pixel space, in contrast to the predominant latent-space models.

Conditional Image Generation

165
0.87 stars / hour

DDT: Decoupled Diffusion Transformer

MCG-NJU/DDT 8 Apr 2025

For ImageNet $256\times256$, Our DDT-XL/2 achieves a new state-of-the-art performance of {1. 31 FID}~(nearly $4\times$ faster training convergence compared to previous diffusion transformers).

Denoising

191
0.75 stars / hour

Agent S2: A Compositional Generalist-Specialist Framework for Computer Use Agents

simular-ai/agent-s 1 Apr 2025

Computer use agents automate digital tasks by directly interacting with graphical user interfaces (GUIs) on computers and mobile devices, offering significant potential to enhance human productivity by completing an open-ended space of user queries.

AI Agent Task Planning

2,312
0.74 stars / hour

Zep: A Temporal Knowledge Graph Architecture for Agent Memory

getzep/graphiti 20 Jan 2025

We introduce Zep, a novel memory layer service for AI agents that outperforms the current state-of-the-art system, MemGPT, in the Deep Memory Retrieval (DMR) benchmark.

RAG Retrieval

3,975
0.72 stars / hour