Zero-shot Voice Conversion with Diffusion Transformers

Plachtaa/seed-vc 15 Nov 2024

Zero-shot voice conversion aims to transform a source speech utterance to match the timbre of a reference speech from an unseen speaker.

In-Context Learning Voice Conversion

1,988
1.38 stars / hour

YOLOE: Real-Time Seeing Anything

THU-MIG/yoloe 10 Mar 2025

Object detection and segmentation are widely employed in computer vision applications, yet conventional models like YOLO series, while efficient and accurate, are limited by predefined categories, hindering adaptability in open scenarios.

10-shot image generation

780
1.31 stars / hour

Niagara: Normal-Integrated Geometric Affine Field for Scene Reconstruction from a Single View

xianzuwu/Niagara 16 Mar 2025

Recent advances in single-view 3D scene reconstruction have highlighted the challenges in capturing fine geometric details and ensuring structural consistency, particularly in high-fidelity outdoor scene modeling.

3D Scene Reconstruction Decoder +1

72
1.29 stars / hour

Evaluating Self-Supervised Learning for Molecular Graph Embeddings

hansen7/molgrapheval NeurIPS 2023

Graph Self-Supervised Learning (GSSL) provides a robust pathway for acquiring embeddings without expert labelling, a capability that carries profound implications for molecular graphs due to the staggering number of potential molecules and the high cost of obtaining labels.

Self-Supervised Learning

111
1.13 stars / hour

Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models

kuleshov-group/bd3lms 12 Mar 2025

Diffusion language models offer unique benefits over autoregressive models due to their potential for parallelized generation and controllability, yet they lag in likelihood modeling and are limited to fixed-length generation.

Denoising Language Modeling +1

361
1.05 stars / hour

Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning

petergriffinjin/search-r1 12 Mar 2025

Efficiently acquiring external knowledge and up-to-date information is essential for effective reasoning and text generation in large language models (LLMs).

Question Answering Reinforcement Learning (RL) +2

1,214
0.85 stars / hour

LBM: Latent Bridge Matching for Fast Image-to-Image Translation

gojasper/lbm 10 Mar 2025

In this paper, we introduce Latent Bridge Matching (LBM), a new, versatile and scalable method that relies on Bridge Matching in a latent space to achieve fast image-to-image translation.

Depth Estimation Image Relighting +2

211
0.83 stars / hour

FoundationStereo: Zero-Shot Stereo Matching

NVlabs/FoundationStereo 17 Jan 2025

However, achieving strong zero-shot generalization - a hallmark of foundation models in other computer vision tasks - remains challenging for stereo matching.

Diversity Stereo Depth Estimation +2

1,012
0.78 stars / hour

Hunyuan3D 2.0: Scaling Diffusion Models for High Resolution Textured 3D Assets Generation

tencent/hunyuan3d-2 21 Jan 2025

This system includes two foundation components: a large-scale shape generation model -- Hunyuan3D-DiT, and a large-scale texture synthesis model -- Hunyuan3D-Paint.

Texture Synthesis

7,384
0.74 stars / hour

Agent S: An Open Agentic Framework that Uses Computers Like a Human

simular-ai/agent-s 10 Oct 2024

We present Agent S, an open agentic framework that enables autonomous interaction with computers through a Graphical User Interface (GUI), aimed at transforming human-computer interaction by automating complex, multi-step tasks.

AI Agent Task Planning

1,298
0.73 stars / hour