Efficient Speech Enhancement via Embeddings from Pre-trained Generative Audioencoders

xiaomi-research/dasheng-denoiser 13 Jun 2025

Recent research has delved into speech enhancement (SE) approaches that leverage audio embeddings from pre-trained models, diverging from time-frequency masking or signal prediction techniques.

Speech Enhancement

35
0.67 stars / hour

HunyuanVideo-Avatar: High-Fidelity Audio-Driven Human Animation for Multiple Characters

tencent-hunyuan/hunyuanvideo-avatar 26 May 2025

This ensures the dynamic motion and strong character consistency; (ii) An Audio Emotion Module (AEM) is introduced to extract and transfer the emotional cues from an emotion reference image to the target generated video, enabling fine-grained and accurate emotion style control; (iii) A Face-Aware Audio Adapter (FAA) is proposed to isolate the audio-driven character with latent-level face mask, enabling independent audio injection via cross-attention for multi-character scenarios.

Human Animation

1,299
0.60 stars / hour

SurveyForge: On the Outline Heuristics, Memory-Driven Generation, and Multi-dimensional Evaluation for Automated Survey Writing

alpha-innovator/surveyforge 6 Mar 2025

Survey paper plays a crucial role in scientific research, especially given the rapid growth of research publications.

Articles Survey

232
0.59 stars / hour

Efficient Part-level 3D Object Generation via Dual Volume Packing

nvlabs/partpacker 11 Jun 2025

Recent progress in 3D object generation has greatly improved both the quality and efficiency.

Diversity Object

225
0.58 stars / hour

PixelsDB: Serverless and NL-Aided Data Analytics with Flexible Service Levels and Prices

pixelsdb/pixels 30 May 2024

The queries are then executed by a serverless query engine that offers varying prices for different performance service levels (SLAs).

Scheduling

233
0.54 stars / hour

Let Them Talk: Audio-Driven Multi-Person Conversational Video Generation

meigen-ai/multitalk 28 May 2025

Audio-driven human animation methods, such as talking head and talking body generation, have made remarkable progress in generating synchronized facial movements and appealing visual quality videos.

Human Animation Instruction Following +1

322
0.54 stars / hour

Direct3D-S2: Gigascale 3D Generation Made Easy with Spatial Sparse Attention

DreamTechAI/Direct3D-S2 23 May 2025

Generating high-resolution 3D shapes using volumetric representations such as Signed Distance Functions (SDFs) presents substantial computational and memory challenges.

3D Generation 3D geometry +5

847
0.53 stars / hour

OmniAudio: Generating Spatial Audio from 360-Degree Video

liuhuadai/omniaudio 21 Apr 2025

To generate spatial audio from 360-degree video, we propose a novel framework OmniAudio, which leverages self-supervised pre-training using both spatial audio data (in FOA format) and large-scale non-spatial data.

Audio Generation

288
0.52 stars / hour

RFUAV: A Benchmark Dataset for Unmanned Aerial Vehicle Detection and Identification

kitoweeknd/RFUAV 12 Mar 2025

In addition to the dataset, RFUAV provides a baseline preprocessing method and model evaluation tools.

Audio Signal Recognition Classification +1

131
0.48 stars / hour

MAGREF: Masked Guidance for Any-Reference Video Generation

magref-video/magref 29 May 2025

Video generation has made substantial strides with the emergence of deep generative models, especially diffusion-based approaches.

Video Generation

96
0.47 stars / hour