MossFormer: Pushing the Performance Limit of Monaural Speech Separation using Gated Single-Head Transformer with Convolution-Augmented Joint Self-Attentions

modelscope/ClearerVoice-Studio 23 Feb 2023

To effectively solve the indirect elemental interactions across chunks in the dual-path architecture, MossFormer employs a joint local and global self-attention architecture that simultaneously performs a full-computation self-attention on local chunks and a linearised low-cost self-attention over the full sequence.

 Ranked #1 on Speech Separation on WSJ0-2mix-16k (using extra training data)

Speech Separation

1,724
3.56 stars / hour

Gaze-LLE: Gaze Target Estimation via Large-Scale Learned Encoders

fkryan/gazelle 12 Dec 2024

We address the problem of gaze target estimation, which aims to predict where a person is looking in a scene.

Gaze Target Estimation

120
2.65 stars / hour

StableAnimator: High-Quality Identity-Preserving Human Image Animation

Francis-Rings/StableAnimator 26 Nov 2024

During inference, we propose a novel Hamilton-Jacobi-Bellman (HJB) equation-based optimization to further enhance the face quality.

Denoising Face Reenactment +3

605
1.83 stars / hour

SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse Viewpoints

kwaivgi/syncammaster 10 Dec 2024

Recent advancements in video diffusion models have shown exceptional abilities in simulating real-world dynamics and maintaining 3D consistency.

4D reconstruction Video Generation

197
1.61 stars / hour

Learning Flow Fields in Attention for Controllable Person Image Generation

franciszzj/leffa 11 Dec 2024

Additionally, we show that our loss is model-agnostic and can be used to improve the performance of other diffusion models.

Attribute Pose Transfer +1

103
1.54 stars / hour

Neural Localizer Fields for Continuous 3D Human Pose and Shape Estimation

isarandi/nlf 10 Jul 2024

With the explosive growth of available training data, single-image 3D human modeling is ahead of a transition to a data-centric paradigm.

3D human pose and shape estimation

151
1.30 stars / hour

Video Seal: Open and Efficient Video Watermarking

facebookresearch/videoseal 12 Dec 2024

To reduce these gaps, this paper introduces Video Seal, a comprehensive framework for neural video watermarking and a competitive open-sourced model.

Video Compression Video Editing

50
1.07 stars / hour

HunyuanVideo: A Systematic Framework For Large Video Generative Models

tencent/hunyuanvideo 3 Dec 2024

In this report, we introduce HunyuanVideo, an innovative open-source video foundation model that demonstrates performance in video generation comparable to, or even surpassing, that of leading closed-source models.

Video Alignment Video Generation

5,787
1.20 stars / hour

LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods

cshaitao/awesome-llms-as-judges 7 Dec 2024

Finally, we provide a detailed analysis of the limitations of LLM judges and discuss potential future directions.

154
1.15 stars / hour