Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning

petergriffinjin/search-r1 12 Mar 2025

Efficiently acquiring external knowledge and up-to-date information is essential for effective reasoning and text generation in large language models (LLMs).

Question Answering Reinforcement Learning (RL) +2

1,360
1.09 stars / hour

Neural Fields with Thermal Activations for Arbitrary-Scale Super-Resolution

prs-eth/thera 29 Nov 2023

We present a novel way to design neural fields such that points can be queried with an adaptive Gaussian PSF, so as to guarantee correct anti-aliasing at any desired output resolution.

Image Super-Resolution

651
1.06 stars / hour

Hunyuan3D 2.0: Scaling Diffusion Models for High Resolution Textured 3D Assets Generation

tencent/hunyuan3d-2 21 Jan 2025

This system includes two foundation components: a large-scale shape generation model -- Hunyuan3D-DiT, and a large-scale texture synthesis model -- Hunyuan3D-Paint.

Texture Synthesis

7,799
0.97 stars / hour

Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey

yaotingwangofficial/awesome-mcot 16 Mar 2025

By extending the advantage of chain-of-thought (CoT) reasoning in human-like step-by-step processes to multimodal contexts, multimodal CoT (MCoT) reasoning has recently garnered significant research attention, especially in the integration with multimodal large language models (MLLMs).

Autonomous Driving multimodal generation +1

162
0.90 stars / hour

Niagara: Normal-Integrated Geometric Affine Field for Scene Reconstruction from a Single View

xianzuwu/Niagara 16 Mar 2025

Recent advances in single-view 3D scene reconstruction have highlighted the challenges in capturing fine geometric details and ensuring structural consistency, particularly in high-fidelity outdoor scene modeling.

3D Scene Reconstruction Decoder +1

81
0.84 stars / hour

Light-R1: Curriculum SFT, DPO and RL for Long COT from Scratch and Beyond

qihoo360/light-r1 13 Mar 2025

The Light-R1 series of work validates training long-COT models from scratch, showcases the art in SFT data and releases SOTA models from RL.

Math

506
0.80 stars / hour

HybridFlow: A Flexible and Efficient RLHF Framework

volcengine/verl 28 Sep 2024

Traditional RL can be modeled as a dataflow, where each node represents computation of a neural network (NN) and each edge denotes data dependencies between the NNs.

Large Language Model

5,512
0.79 stars / hour

YOLOE: Real-Time Seeing Anything

THU-MIG/yoloe 10 Mar 2025

Object detection and segmentation are widely employed in computer vision applications, yet conventional models like YOLO series, while efficient and accurate, are limited by predefined categories, hindering adaptability in open scenarios.

10-shot image generation

855
0.79 stars / hour

Zero-shot Voice Conversion with Diffusion Transformers

Plachtaa/seed-vc 15 Nov 2024

Zero-shot voice conversion aims to transform a source speech utterance to match the timbre of a reference speech from an unseen speaker.

In-Context Learning Voice Conversion

2,048
0.76 stars / hour

Seeing the Future, Perceiving the Future: A Unified Driving World Model for Future Generation and Perception

dk-liang/unifuture 17 Mar 2025

Extensive experiments on the nuScenes dataset demonstrate that UniFuture outperforms specialized models on future generation and perception tasks, highlighting the advantages of a unified, structurally-aware world model.

Future prediction Scene Generation

68
0.74 stars / hour