Less-to-More Generalization: Unlocking More Controllability by In-Context Generation

bytedance/uno 2 Apr 2025

In this study, we propose a highly-consistent data synthesis pipeline to tackle this challenge.

Conditional Image Generation Personalized Image Generation +1

789
0.93 stars / hour

REPA-E: Unlocking VAE for End-to-End Tuning with Latent Diffusion Transformers

End2End-Diffusion/REPA-E 14 Apr 2025

We show that while diffusion loss is ineffective, end-to-end training can be unlocked through the representation-alignment (REPA) loss -- allowing both VAE and diffusion model to be jointly tuned during the training process.

103
0.92 stars / hour

Liquid: Language Models are Scalable Multi-modal Generators

foundationvision/liquid 5 Dec 2024

We present Liquid, an auto-regressive generation paradigm that seamlessly integrates visual comprehension and generation by tokenizing images into discrete codes and learning these code embeddings alongside text tokens within a shared feature space for both vision and language.

Language Modeling Language Modelling +2

515
0.84 stars / hour

Kimi-VL Technical Report

moonshotai/kimi-vl 10 Apr 2025

We present Kimi-VL, an efficient open-source Mixture-of-Experts (MoE) vision-language model (VLM) that offers advanced multimodal reasoning, long-context understanding, and strong agent capabilities - all while activating only 2. 8B parameters in its language decoder (Kimi-VL-A3B).

Long-Context Understanding Mathematical Reasoning +3

743
0.82 stars / hour

Advanced Video Inpainting Using Optical Flow-Guided Efficient Diffusion

nevsnev/fgdvi 1 Dec 2024

Specifically, FloED employs a dual-branch architecture, where a flow branch first restores corrupted flow and a multi-scale flow adapter provides motion guidance to the main inpainting branch.

Denoising Optical Flow Estimation +1

149
0.79 stars / hour

NdLinear Is All You Need for Representation Learning

ensemble-core/ndlinear 21 Mar 2025

We propose NdLinear as a drop-in replacement for standard linear layers -- marking an important step toward next-generation neural architectures.

All Representation Learning

200
0.78 stars / hour

The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search

sakanaai/ai-scientist-v2 10 Apr 2025

AI is increasingly playing a pivotal role in transforming how scientific discoveries are made.

scientific discovery

640
0.74 stars / hour

OctGPT: Octree-based Multiscale Autoregressive Models for 3D Shape Generation

octree-nn/octgpt 14 Apr 2025

In this paper, we introduce OctGPT, a novel multiscale autoregressive model for 3D shape generation that dramatically improves the efficiency and performance of prior 3D autoregressive approaches, while rivaling or surpassing state-of-the-art diffusion models.

3D Shape Generation

89
0.74 stars / hour

VSLAM-LAB: A Comprehensive Framework for Visual SLAM Methods and Datasets

alejandrofontan/vslam-lab 6 Apr 2025

Visual Simultaneous Localization and Mapping (VSLAM) research faces significant challenges due to fragmented toolchains, complex system configurations, and inconsistent evaluation methodologies.

Simultaneous Localization and Mapping

168
0.64 stars / hour

Genius: A Generalizable and Purely Unsupervised Self-Training Framework For Advanced Reasoning

xufangzhi/genius 11 Apr 2025

This motivates us to enhance LLM reasoning without the need for external supervision.

46
0.59 stars / hour