Transformers without Normalization

jiachenzhu/DyT 13 Mar 2025

Normalization layers are ubiquitous in modern neural networks and have long been considered essential.

Self-Supervised Learning

675
0.72 stars / hour

HybridFlow: A Flexible and Efficient RLHF Framework

volcengine/verl 28 Sep 2024

Traditional RL can be modeled as a dataflow, where each node represents computation of a neural network (NN) and each edge denotes data dependencies between the NNs.

Large Language Model

5,406
0.70 stars / hour

Light-R1: Curriculum SFT, DPO and RL for Long COT from Scratch and Beyond

qihoo360/light-r1 13 Mar 2025

The Light-R1 series of work validates training long-COT models from scratch, showcases the art in SFT data and releases SOTA models from RL.

Math

491
0.68 stars / hour

OmniSQL: Synthesizing High-quality Text-to-SQL Data at Scale

RUCKBReasoning/OmniSQL 4 Mar 2025

Text-to-SQL, the task of translating natural language questions into SQL queries, plays a crucial role in enabling non-experts to interact with databases.

Text-To-SQL

134
0.67 stars / hour

Retrieval-Augmented Generation with Hierarchical Knowledge

hhy-huang/HiRAG 13 Mar 2025

Graph-based Retrieval-Augmented Generation (RAG) methods have significantly enhanced the performance of large language models (LLMs) in domain-specific tasks.

Multi-hop Question Answering Question Answering +2

72
0.62 stars / hour

Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models

hiyouga/easyr1 9 Mar 2025

However, direct training with RL struggles to activate complex reasoning capabilities such as questioning and reflection in MLLMs, due to the absence of substantial high-quality multimodal reasoning data.

Math Multimodal Reasoning +1

1,626
0.61 stars / hour

HealthGPT: A Medical Large Vision-Language Model for Unifying Comprehension and Generation via Heterogeneous Knowledge Adaptation

dcdmllm/healthgpt 14 Feb 2025

To effectively learn the HealthGPT, we devise a comprehensive medical domain-specific comprehension and generation dataset called VL-Health.

Language Modeling Language Modelling +1

614
0.55 stars / hour

GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing

rongyaofang/got 13 Mar 2025

We present Generation Chain-of-Thought (GoT), a novel paradigm that enables generation and editing through an explicit language reasoning process before outputting images.

Language Modeling Language Modelling +3

157
0.54 stars / hour

DEIM: DETR with Improved Matching for Fast Convergence

shihuahuang95/deim 5 Dec 2024

We introduce DEIM, an innovative and efficient training framework designed to accelerate convergence in real-time object detection with Transformer-based architectures (DETR).

 Ranked #1 on Real-Time Object Detection on MS COCO (using extra training data)

Data Augmentation object-detection +1

521
0.53 stars / hour

LatentSync: Audio Conditioned Latent Diffusion Models for Lip Sync

bytedance/LatentSync 12 Dec 2024

Since we did not change the overall training framework of SyncNet, our experience can also be applied to other lip sync and audio-driven portrait animation methods that utilize SyncNet.

Portrait Animation

3,143
0.51 stars / hour