TorchFX: A modern approach to Audio DSP with PyTorch and GPU acceleration

matteospanio/torchfx 11 Apr 2025

In response, we introduce TorchFX: a GPU-accelerated Python library for DSP, specifically engineered to facilitate sophisticated audio signal processing.

Audio Signal Processing Benchmarking

64
0.57 stars / hour

IndexTTS: An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System

index-tts/index-tts 8 Feb 2025

Recently, large language model (LLM) based text-to-speech (TTS) systems have gradually become the mainstream in the industry due to their high naturalness and powerful zero-shot voice cloning capabilities. Here, we introduce the IndexTTS system, which is mainly based on the XTTS and Tortoise model.

Decoder Language Modeling +5

1,008
0.55 stars / hour

Zep: A Temporal Knowledge Graph Architecture for Agent Memory

getzep/graphiti 20 Jan 2025

We introduce Zep, a novel memory layer service for AI agents that outperforms the current state-of-the-art system, MemGPT, in the Deep Memory Retrieval (DMR) benchmark.

RAG Retrieval

4,041
0.54 stars / hour

PixelFlow: Pixel-Space Generative Models with Flow

shoufachen/pixelflow 10 Apr 2025

We present PixelFlow, a family of image generation models that operate directly in the raw pixel space, in contrast to the predominant latent-space models.

Conditional Image Generation

168
0.53 stars / hour

LHM: Large Animatable Human Reconstruction Model from a Single Image in Seconds

aigc3d/LHM 13 Mar 2025

Animatable 3D human reconstruction from a single image is a challenging problem due to the ambiguity in decoupling geometry, appearance, and deformation.

3D Human Reconstruction

1,853
0.53 stars / hour

MonSter: Marry Monodepth to Stereo Unleashes Power

junda24/monster 15 Jan 2025

The refined monodepth is in turn guides stereo effectively at ill-posed regions.

Monocular Depth Estimation Stereo Matching +1

319
0.52 stars / hour

3DGUT: Enabling Distorted Cameras and Secondary Rays in Gaussian Splatting

nv-tlabs/3dgrut 17 Dec 2024

3D Gaussian Splatting (3DGS) enables efficient reconstruction and high-fidelity real-time rendering of complex scenes on consumer hardware.

3DGS

641
0.52 stars / hour

InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity

bytedance/infiniteyou 20 Mar 2025

Achieving flexible and high-fidelity identity-preserved image generation remains formidable, particularly with advanced Diffusion Transformers (DiTs) like FLUX.

Image Generation

2,003
0.50 stars / hour

Affordable AI Assistants with Knowledge Graph of Thoughts

spcl/knowledge-graph-of-thoughts 3 Apr 2025

Such structured representation of task-relevant knowledge enables low-cost models to solve complex tasks effectively.

Knowledge Graphs Math

67
0.49 stars / hour

VGGT: Visual Geometry Grounded Transformer

facebookresearch/vggt 14 Mar 2025

We present VGGT, a feed-forward neural network that directly infers all key 3D attributes of a scene, including camera parameters, point maps, depth maps, and 3D point tracks, from one, a few, or hundreds of its views.

Depth Estimation Novel View Synthesis +3

5,140
0.47 stars / hour