rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking

hkust-nlp/simplerl-reason 8 Jan 2025

We present rStar-Math to demonstrate that small language models (SLMs) can rival or even surpass the math reasoning capability of OpenAI o1, without distillation from superior models.

Math

2,696
0.61 stars / hour

MatterGen: a generative model for inorganic materials design

microsoft/mattergen 6 Dec 2023

We further introduce adapter modules to enable fine-tuning towards any given property constraints with a labeled dataset.

model

1,127
0.59 stars / hour

Accelerating Data Processing and Benchmarking of AI Models for Pathology

mahmoodlab/trident 10 Feb 2025

Advances in foundation modeling have reshaped computational pathology.

Benchmarking

63
0.57 stars / hour

On the Emergence of Thinking in LLMs I: Searching for the Right Intuition

OpenLLMAI/OpenRLHF 10 Feb 2025

Lastly, we propose a theory as to why RLSP search strategy is more suitable for LLMs inspired by a remarkable result that says CoT provably increases computational power of LLMs, which grows as the number of steps in CoT \cite{li2024chain, merrill2023expresssive}.

Math

4,725
0.52 stars / hour

VARGPT: Unified Understanding and Generation in a Visual Autoregressive Multimodal Large Language Model

VARGPT-family/VARGPT 21 Jan 2025

We present VARGPT, a novel multimodal large language model (MLLM) that unifies visual understanding and generation within a single autoregressive framework.

Image Generation Instruction Following +6

156
0.52 stars / hour

REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models

openrlhf/openrlhf 4 Jan 2025

Reinforcement Learning from Human Feedback (RLHF) has emerged as a critical approach for aligning large language models with human preferences, witnessing rapid algorithmic evolution through methods such as Proximal Policy Optimization (PPO), Direct Preference Optimization (DPO), REINFORCE Leave One-Out (RLOO), ReMax, and Group Relative Policy Optimization (GRPO).

Computational Efficiency

4,722
0.51 stars / hour

s1: Simple test-time scaling

simplescaling/s1 31 Jan 2025

After supervised finetuning the Qwen2. 5-32B-Instruct language model on s1K and equipping it with budget forcing, our model s1-32B exceeds o1-preview on competition math questions by up to 27% (MATH and AIME24).

Language Modeling Language Modelling +2

5,343
0.51 stars / hour

SynthVLM: High-Efficiency and High-Quality Synthetic Data for Vision Language Models

starriver030515/synthvlm 30 Jul 2024

Crucially, our method's reliance on purely generated data ensures the preservation of privacy, achieving SoTA performance with just 100k data points (only 18% of the official dataset size).

Caption Generation Question Answering

92
0.51 stars / hour

Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution

qwenlm/qwen2.5-vl 18 Sep 2024

We present the Qwen2-VL Series, an advanced upgrade of the previous Qwen-VL models that redefines the conventional predetermined-resolution approach in visual processing.

Natural Language Visual Grounding Temporal Relation Extraction +2

7,369
0.50 stars / hour