RETSim: Resilient and Efficient Text Similarity

unum-cloud/usearch 28 Nov 2023

This paper introduces RETSim (Resilient and Efficient Text Similarity), a lightweight, multilingual deep learning model trained to produce robust metric embeddings for near-duplicate text retrieval, clustering, and dataset deduplication tasks.

Adversarial Text Clustering +2

2,960
0.31 stars / hour

Efficient Part-level 3D Object Generation via Dual Volume Packing

nvlabs/partpacker 11 Jun 2025

Recent progress in 3D object generation has greatly improved both the quality and efficiency.

Diversity Object

572
0.31 stars / hour

OctoThinker: Mid-training Incentivizes Reinforcement Learning Scaling

gair-nlp/octothinker 25 Jun 2025

To support further research, we release our open-source models along with a curated math reasoning-intensive corpus of over 70 billion tokens (i. e., MegaMath-Web-Pro-Max).

Language Modeling Language Modelling +4

141
0.31 stars / hour

RAGAS: Automated Evaluation of Retrieval Augmented Generation

explodinggradients/ragas 26 Sep 2023

We introduce RAGAs (Retrieval Augmented Generation Assessment), a framework for reference-free evaluation of Retrieval Augmented Generation (RAG) pipelines.

RAG Retrieval +1

9,899
0.31 stars / hour

LLaMA-Omni2: LLM-based Real-time Spoken Chatbot with Autoregressive Streaming Speech Synthesis

rainbowluocs/openomni 5 May 2025

Real-time, intelligent, and natural speech interaction is an essential part of the next-generation human-computer interaction.

Chatbot Decoder +3

89
0.29 stars / hour

ZipVoice: Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching

k2-fsa/ZipVoice 16 Jun 2025

Existing large-scale zero-shot text-to-speech (TTS) models deliver high speech quality but suffer from slow inference speeds due to massive parameters.

Decoder Speech Synthesis +3

246
0.27 stars / hour

HunyuanVideo-Avatar: High-Fidelity Audio-Driven Human Animation for Multiple Characters

tencent-hunyuan/hunyuanvideo-avatar 26 May 2025

This ensures the dynamic motion and strong character consistency; (ii) An Audio Emotion Module (AEM) is introduced to extract and transfer the emotional cues from an emotion reference image to the target generated video, enabling fine-grained and accurate emotion style control; (iii) A Face-Aware Audio Adapter (FAA) is proposed to isolate the audio-driven character with latent-level face mask, enabling independent audio injection via cross-attention for multi-character scenarios.

Human Animation

1,640
0.27 stars / hour

V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning

facebookresearch/vjepa2 11 Jun 2025

Finally, we show how self-supervised learning can be applied to robotic planning tasks by post-training a latent action-conditioned world model, V-JEPA 2-AC, using less than 62 hours of unlabeled robot videos from the Droid dataset.

Action Anticipation Large Language Model +3

1,863
0.27 stars / hour

Efficient Reasoning Models: A Survey

fscdc/awesome-efficient-reasoning-models 15 Apr 2025

Reasoning models have demonstrated remarkable progress in solving complex and logic-intensive tasks by generating extended Chain-of-Thoughts (CoTs) prior to arriving at a final answer.

Knowledge Distillation Model Compression +1

227
0.26 stars / hour