Agent-as-a-Judge: Evaluate Agents with Agents

metauto-ai/agent-as-a-judge 14 Oct 2024

To address this, we introduce the Agent-as-a-Judge framework, wherein agentic systems are used to evaluate agentic systems.

Code Generation

486
0.28 stars / hour

Qwen2 Technical Report

qwenlm/qwen1.5 15 Jul 2024

This report introduces the Qwen2 series, the latest addition to our large language models and large multimodal models.

Ranked #3 on Arithmetic Reasoning on GSM8K (using extra training data)

Arithmetic Reasoning GSM8K +6

20,783
0.28 stars / hour

HealthGPT: A Medical Large Vision-Language Model for Unifying Comprehension and Generation via Heterogeneous Knowledge Adaptation

dcdmllm/healthgpt 14 Feb 2025

To effectively learn the HealthGPT, we devise a comprehensive medical domain-specific comprehension and generation dataset called VL-Health.

Language Modeling Language Modelling +1

1,388
0.27 stars / hour

An Empirical Study of Qwen3 Quantization

efficient-ml/qwen3-quantization 4 May 2025

The Qwen series has emerged as a leading family of open-source Large Language Models (LLMs), demonstrating remarkable capabilities in natural language understanding tasks.

Natural Language Understanding Quantization

12
0.27 stars / hour

LARGE: Legal Retrieval Augmented Generation Evaluation Tool

hoorangyee/lrage 2 Apr 2025

Recently, building retrieval-augmented generation (RAG) systems to enhance the capability of large language models (LLMs) has become a common practice.

RAG Retrieval

42
0.26 stars / hour

Erasing Without Remembering: Safeguarding Knowledge Forgetting in Large Language Models

maybelizzy/ugbench 27 Feb 2025

In this paper, we explore machine unlearning from a novel dimension, by studying how to safeguard model unlearning in large language models (LLMs).

Machine Unlearning

30
0.26 stars / hour

Hunyuan3D 2.0: Scaling Diffusion Models for High Resolution Textured 3D Assets Generation

tencent/hunyuan3d-2 21 Jan 2025

This system includes two foundation components: a large-scale shape generation model -- Hunyuan3D-DiT, and a large-scale texture synthesis model -- Hunyuan3D-Paint.

Texture Synthesis

9,629
0.26 stars / hour

ReTool: Reinforcement Learning for Strategic Tool Use in LLMs

volcengine/verl 15 Apr 2025

While reasoning models (e. g., DeepSeek R1) trained with reinforcement learning (RL), excel in textual reasoning, they struggle in scenarios requiring structured problem-solving, such as geometric reasoning, concise computation, or complex equation solving-areas where computational tools like code interpreters (CI) demonstrate distinct advantages.

Math Mathematical Reasoning +3

7,900
0.26 stars / hour

What Kind of Visual Tokens Do We Need? Training-free Visual Token Pruning for Multi-modal Large Language Models from the Perspective of Graph

jytmelon/g-prune 4 Jan 2025

Recent Multimodal Large Language Models(MLLMs) often use a large number of visual tokens to compensate their visual shortcoming, leading to excessive computation and obvious visual redundancy.

TextVQA

202
0.26 stars / hour

Registration of 3D Point Sets Using Exponential-based Similarity Matrix

aralab-unr/esm_icp 7 May 2025

However, state-of-the-art registration techniques often struggle when large rotational differences exist between point sets or when the data is significantly corrupted by sensor noise.

Point Cloud Registration

20
0.25 stars / hour