After supervised finetuning the Qwen2. 5-32B-Instruct language model on s1K and equipping it with budget forcing, our model s1-32B exceeds o1-preview on competition math questions by up to 27% (MATH and AIME24).
Ranked #4 on
Mathematical Reasoning
on AIME24
Since the release of ChatGPT, large language models (LLMs) have demonstrated remarkable capabilities across various domains.
We present the Qwen2-VL Series, an advanced upgrade of the previous Qwen-VL models that redefines the conventional predetermined-resolution approach in visual processing.
Ranked #3 on
Video Question Answering
on TVBench
Natural Language Visual Grounding
Temporal Relation Extraction
+2
We present DeepSeek-VL2, an advanced series of large Mixture-of-Experts (MoE) Vision-Language Models that significantly improves upon its predecessor, DeepSeek-VL, through two key major upgrades.
Ranked #1 on
Referring Expression Comprehension
on RefCOCOg-test
In addition, for hosted solutions, the proprietary models currently include two mixture-of-experts (MoE) variants: Qwen2. 5-Turbo and Qwen2. 5-Plus, both available from Alibaba Cloud Model Studio.
Ranked #6 on
on GPQA
This report introduces the Qwen2 series, the latest addition to our large language models and large multimodal models.
Ranked #1 on
Arithmetic Reasoning
on GSM8K
(using extra training data)
Through this continued pre-training, DeepSeek-Coder-V2 substantially enhances the coding and mathematical reasoning capabilities of DeepSeek-V2, while maintaining comparable performance in general language tasks.
IntellAgent represents a paradigm shift in evaluating conversational AI.
We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1.
Ranked #1 on
Mathematical Reasoning
on AIME24
In this work, we introduce Janus-Pro, an advanced version of the previous work Janus.