We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token.
In this report, we introduce the Qwen2. 5-Coder series, a significant upgrade from its predecessor, CodeQwen1. 5.
The rapid development of large language models has revolutionized code intelligence in software development.
Ranked #4 on
Code Generation
on APPS
Mathematical reasoning poses a significant challenge for language models due to its complex and structured nature.
Ranked #27 on
Arithmetic Reasoning
on GSM8K
(using extra training data)
In this work, we make the first attempt to fine-tune all-modality models (i. e. input and output with any modality, also named any-to-any models) using human preference data across all modalities (including text, image, audio, and video), ensuring its behavior aligns with human intentions.
The rapid development of open-source large language models (LLMs) has been truly remarkable.
3D Gaussian Splatting (3DGS) is a process that enables the direct creation of 3D objects from 2D images.
Language model post-training is applied to refine behaviors and unlock new skills across a wide range of recent language models, but open recipes for applying these techniques lag behind proprietary ones.
Large Language Models (LLMs) have reshaped natural language processing, powering applications from multi-hop retrieval and question answering to autonomous agent workflows.
This system includes two foundation components: a large-scale shape generation model -- Hunyuan3D-DiT, and a large-scale texture synthesis model -- Hunyuan3D-Paint.