s1: Simple test-time scaling

simplescaling/s1 31 Jan 2025

After supervised finetuning the Qwen2. 5-32B-Instruct language model on s1K and equipping it with budget forcing, our model s1-32B exceeds o1-preview on competition math questions by up to 27% (MATH and AIME24).

Language Modeling Language Modelling +2

3,804
5.09 stars / hour

Flaming-hot Initiation with Regular Execution Sampling for Large Language Models

volcengine/verl 28 Oct 2024

Since the release of ChatGPT, large language models (LLMs) have demonstrated remarkable capabilities across various domains.

Diversity Math

2,579
3.27 stars / hour

Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution

qwenlm/qwen2-vl 18 Sep 2024

We present the Qwen2-VL Series, an advanced upgrade of the previous Qwen-VL models that redefines the conventional predetermined-resolution approach in visual processing.

Natural Language Visual Grounding Temporal Relation Extraction +2

6,823
2.52 stars / hour

DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding

deepseek-ai/deepseek-vl2 13 Dec 2024

We present DeepSeek-VL2, an advanced series of large Mixture-of-Experts (MoE) Vision-Language Models that significantly improves upon its predecessor, DeepSeek-VL, through two key major upgrades.

Chart Understanding Optical Character Recognition +4

3,160
2.44 stars / hour

Qwen2.5 Technical Report

qwenlm/qwen2.5 19 Dec 2024

In addition, for hosted solutions, the proprietary models currently include two mixture-of-experts (MoE) variants: Qwen2. 5-Turbo and Qwen2. 5-Plus, both available from Alibaba Cloud Model Studio.

Ranked #6 on on GPQA

Common Sense Reasoning +4

14,981
1.33 stars / hour

Qwen2 Technical Report

qwenlm/qwen2 15 Jul 2024

This report introduces the Qwen2 series, the latest addition to our large language models and large multimodal models.

 Ranked #1 on Arithmetic Reasoning on GSM8K (using extra training data)

Arithmetic Reasoning GSM8K +5

14,986
1.32 stars / hour

DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence

deepseek-ai/deepseek-coder-v2 17 Jun 2024

Through this continued pre-training, DeepSeek-Coder-V2 substantially enhances the coding and mathematical reasoning capabilities of DeepSeek-V2, while maintaining comparable performance in general language tasks.

16k Language Modeling +3

4,963
1.22 stars / hour

IntellAgent: A Multi-Agent Framework for Evaluating Conversational AI Systems

plurai-ai/intellagent 19 Jan 2025

IntellAgent represents a paradigm shift in evaluating conversational AI.

Navigate

599
0.97 stars / hour

Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling

deepseek-ai/janus 29 Jan 2025

In this work, we introduce Janus-Pro, an advanced version of the previous work Janus.

Instruction Following Text-to-Image Generation

15,245
0.94 stars / hour