MM-Eureka: Exploring Visual Aha Moment with Rule-based Large-scale Reinforcement Learning

modalminds/mm-eureka 10 Mar 2025

We present MM-Eureka, a multimodal reasoning model that successfully extends large-scale rule-based reinforcement learning (RL) to multimodal reasoning.

Multimodal Reasoning Reinforcement Learning (RL)

339
0.54 stars / hour

Proximal Policy Optimization Algorithms

hiyouga/easyr1 20 Jul 2017

We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent.

Continuous Control Dota 2 +6

1,448
0.54 stars / hour

HealthGPT: A Medical Large Vision-Language Model for Unifying Comprehension and Generation via Heterogeneous Knowledge Adaptation

dcdmllm/healthgpt 14 Feb 2025

To effectively learn the HealthGPT, we devise a comprehensive medical domain-specific comprehension and generation dataset called VL-Health.

Language Modeling Language Modelling +1

475
0.53 stars / hour

PPTAgent: Generating and Evaluating Presentations Beyond Text-to-Slides

icip-cas/pptagent 7 Jan 2025

Automatically generating presentations from documents is a challenging task that requires balancing content quality, visual design, and structural coherence.

813
0.53 stars / hour

Exploring the Coordination of Frequency and Attention in Masked Image Modeling

guijiejie/amt 28 Nov 2022

To tackle this issue, we propose the Frequency \& Attention-driven Masking and Throwing Strategy (FAMT), which can extract semantic patches and reduce the number of training patches to boost model performance and training efficiency simultaneously.

Attribute Representation Learning +1

74
0.53 stars / hour

A Survey on Self-supervised Learning: Algorithms, Applications, and Future Trends

guijiejie/ssl 13 Jan 2023

Deep supervised learning algorithms typically require a large volume of labeled data to achieve satisfactory performance.

Self-Supervised Learning

139
0.52 stars / hour

Audio Flamingo 2: An Audio-Language Model with Long-Audio Understanding and Expert Reasoning Abilities

NVIDIA/audio-flamingo 6 Mar 2025

Fine-tuning AF2 on LongAudio leads to exceptional performance on our proposed LongAudioBench, an expert annotated benchmark for evaluating ALMs on long audio understanding capabilities.

Audio captioning Language Modeling +2

358
0.51 stars / hour

G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model

deep-agent/r1-v 18 Dec 2023

We first analyze the limitations of current Multimodal Large Language Models (MLLMs) in this area: they struggle to accurately comprehending basic geometric elements and their relationships.

Language Modeling Language Modelling +2

3,257
0.50 stars / hour

LightGen: Efficient Image Generation through Knowledge Distillation and Direct Preference Optimization

xianfengwu01/lightgen 11 Mar 2025

Recent advances in text-to-image generation have primarily relied on extensive datasets and parameter-heavy architectures.

Knowledge Distillation Text-to-Image Generation

43
0.49 stars / hour

2 OLMo 2 Furious

allenai/OLMo-core 31 Dec 2024

Our modified model architecture and training recipe achieve both better training stability and improved per-token efficiency.

125
0.49 stars / hour