DeepSeek-V3 Technical Report

deepseek-ai/deepseek-v3 27 Dec 2024

We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token.

Language Modeling Language Modelling

80,072
0.91 stars / hour

Qwen2.5-Coder Technical Report

qwenlm/qwen2.5-coder 18 Sep 2024

In this report, we introduce the Qwen2. 5-Coder series, a significant upgrade from its predecessor, CodeQwen1. 5.

Code Generation +2

4,367
0.82 stars / hour

DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence

deepseek-ai/DeepSeek-Coder 25 Jan 2024

The rapid development of large language models has revolutionized code intelligence in software development.

Code Generation Language Modeling +2

18,500
0.75 stars / hour

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

deepseek-ai/deepseek-math 5 Feb 2024

Mathematical reasoning poses a significant challenge for language models due to its complex and structured nature.

Ranked #27 on Arithmetic Reasoning on GSM8K (using extra training data)

Arithmetic Reasoning Math +1

2,233
0.68 stars / hour

Align Anything: Training All-Modality Models to Follow Instructions with Language Feedback

pku-alignment/align-anything 20 Dec 2024

In this work, we make the first attempt to fine-tune all-modality models (i. e. input and output with any modality, also named any-to-any models) using human preference data across all modalities (including text, image, audio, and video), ensuring its behavior aligns with human intentions.

Instruction Following

1,555
0.68 stars / hour

DeepSeek LLM: Scaling Open-Source Language Models with Longtermism

deepseek-ai/deepseek-llm 5 Jan 2024

The rapid development of open-source large language models (LLMs) has been truly remarkable.

5,694
0.66 stars / hour

RaySplats: Ray Tracing based Gaussian Splatting

kbyrski/raysplatting 31 Jan 2025

3D Gaussian Splatting (3DGS) is a process that enables the direct creation of 3D objects from 2D images.

3DGS

85
0.58 stars / hour

Tulu 3: Pushing Frontiers in Open Language Model Post-Training

allenai/open-instruct 22 Nov 2024

Language model post-training is applied to refine behaviors and unlock new skills across a wide range of recent language models, but open recipes for applying these techniques lag behind proprietary ones.

Language Modeling Language Modelling

2,588
0.53 stars / hour

LLM-AutoDiff: Auto-Differentiate Any LLM Workflow

sylphai-inc/adalflow 28 Jan 2025

Large Language Models (LLMs) have reshaped natural language processing, powering applications from multi-hop retrieval and question answering to autonomous agent workflows.

Prompt Engineering Question Answering +1

2,679
0.47 stars / hour

Hunyuan3D 2.0: Scaling Diffusion Models for High Resolution Textured 3D Assets Generation

tencent/hunyuan3d-2 21 Jan 2025

This system includes two foundation components: a large-scale shape generation model -- Hunyuan3D-DiT, and a large-scale texture synthesis model -- Hunyuan3D-Paint.

Texture Synthesis

5,821
0.47 stars / hour