LTX-Video: Realtime Video Latent Diffusion

Lightricks/LTX-Video 30 Dec 2024

To address this, our VAE decoder is tasked with both latent-to-pixel conversion and the final denoising step, producing the clean result directly in pixel space.

Denoising GPU +1

7,269
0.35 stars / hour

DreamVLA: A Vision-Language-Action Model Dreamed with Comprehensive World Knowledge

Zhangwenyao1/DreamVLA 7 Jul 2025

However, existing methods are limited to challenging image-based forecasting, which suffers from redundant information and lacks comprehensive and critical world knowledge, including dynamic, spatial and semantic information.

Image Generation Multimodal Reasoning +3

102
0.35 stars / hour

Learning Robust Stereo Matching in the Wild with Selective Mixture-of-Experts

cocowy1/smoe-stereo 7 Jul 2025

To address this, we propose SMoEStereo, a novel framework that adapts VFMs for stereo matching through a tailored, scene-specific fusion of Low-Rank Adaptation (LoRA) and Mixture-of-Experts (MoE) modules.

Inductive Bias Mixture-of-Experts +1

132
0.34 stars / hour

Language Model Inversion

jxmorris12/vec2text 22 Nov 2023

We consider the problem of language model inversion and show that next-token probabilities contain a surprising amount of information about the preceding text.

Language Modeling Language Modelling +1

896
0.34 stars / hour

SoundMind: RL-Incentivized Logic Reasoning for Audio-Language Models

xid32/soundmind 15 Jun 2025

While large language models have shown reasoning capabilities, their application to the audio modality, particularly in large audio-language models (ALMs), remains significantly underdeveloped.

Logical Reasoning Reinforcement Learning (RL)

782
0.32 stars / hour

Energy-Based Transformers are Scalable Learners and Thinkers

alexiglad/EBT 2 Jul 2025

Further, we find that EBTs achieve better results than existing models on most downstream tasks given the same or worse pretraining performance, suggesting that EBTs generalize better than existing approaches.

Image Denoising Math

299
0.32 stars / hour

Do Large Language Models Need a Content Delivery Network?

lmcache/lmcache 16 Sep 2024

As the use of large language models (LLMs) expands rapidly, so does the range of knowledge needed to supplement various LLM queries.

In-Context Learning

3,220
0.31 stars / hour

ZipVoice: Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching

k2-fsa/ZipVoice 16 Jun 2025

Existing large-scale zero-shot text-to-speech (TTS) models deliver high speech quality but suffer from slow inference speeds due to massive parameters.

Decoder Speech Synthesis +3

287
0.30 stars / hour

GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

thudm/glm-4.1v-thinking 1 Jul 2025

In a comprehensive evaluation across 28 public benchmarks, our model outperforms Qwen2. 5-VL-7B on nearly all tasks and achieves comparable or even superior performance on 18 benchmarks relative to the significantly larger Qwen2. 5-VL-72B.

document understanding Multimodal Reasoning +1

829
0.30 stars / hour

Large Language Model Agent: A Survey on Methodology, Applications and Challenges

luo-junyu/awesome-agent-papers 27 Mar 2025

The era of intelligent agents is upon us, driven by revolutionary advancements in large language models.

Language Modeling Language Modelling +1

1,212
0.29 stars / hour