LTX-Video: Realtime Video Latent Diffusion

Lightricks/LTX-Video 30 Dec 2024

To address this, our VAE decoder is tasked with both latent-to-pixel conversion and the final denoising step, producing the clean result directly in pixel space.

Denoising Image to Video Generation

5,633
1.27 stars / hour

OpenThinkIMG: Learning to Think with Images via Visual Tool Reinforcement Learning

zhaochen0110/openthinkimg 13 May 2025

We hope OpenThinkIMG can serve as a foundational framework for advancing dynamic, tool-augmented visual reasoning, helping the community develop AI agents that can genuinely "think with images".

Reinforcement Learning (RL) Visual Reasoning

62
1.15 stars / hour

Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning

going-doer/paper2code 24 Apr 2025

Despite the rapid growth of machine learning research, corresponding code implementations are often unavailable, making it slow and labor-intensive for researchers to reproduce results and build upon prior work.

Code Generation

1,978
1.11 stars / hour

Nexus-Gen: A Unified Model for Image Understanding, Generation, and Editing

modelscope/nexus-gen 30 Apr 2025

To bridge this gap, we present Nexus-Gen, a unified model that synergizes the language reasoning capabilities of LLMs with the image synthesis power of diffusion models.

Image Generation

173
1.02 stars / hour

Unified Continuous Generative Models

LINs-Lab/UCGM 12 May 2025

We introduce a unified framework for training, sampling, and analyzing these models.

Image Generation

80
0.96 stars / hour

3D Scene Generation: A Survey

hzxie/awesome-3d-scene-generation 8 May 2025

Recent advances in deep generative models (e. g., GANs, diffusion models) and 3D representations (e. g., NeRF, 3D Gaussians) have enabled the learning of real-world scene distributions, improving fidelity, diversity, and view consistency.

Autonomous Driving Diversity +3

302
0.94 stars / hour

WebThinker: Empowering Large Reasoning Models with Deep Research Capability

ruc-nlpir/webthinker 30 Apr 2025

Large reasoning models (LRMs), such as OpenAI-o1 and DeepSeek-R1, demonstrate impressive long-horizon reasoning capabilities.

Navigate

817
0.94 stars / hour

Agent S2: A Compositional Generalist-Specialist Framework for Computer Use Agents

simular-ai/agent-s 1 Apr 2025

Computer use agents automate digital tasks by directly interacting with graphical user interfaces (GUIs) on computers and mobile devices, offering significant potential to enhance human productivity by completing an open-ended space of user queries.

AI Agent Task Planning

4,866
0.93 stars / hour

Human-like Episodic Memory for Infinite Context LLMs

em-llm/EM-LLM-model 12 Jul 2024

Large language models (LLMs) have shown remarkable capabilities, but still struggle with processing extensive contexts, limiting their ability to maintain coherence and accuracy over long sequences.

Computational Efficiency Event Segmentation +2

189
0.92 stars / hour

OpenHelix: A Short Survey, Empirical Analysis, and Open-Source Dual-System VLA Model for Robotic Manipulation

OpenHelix-robot/OpenHelix 6 May 2025

Dual-system VLA (Vision-Language-Action) architectures have become a hot topic in embodied intelligence research, but there is a lack of sufficient open-source work for further performance analysis and optimization.

Vision-Language-Action

133
0.87 stars / hour