Search Results for author: Dongfu Jiang

Found 14 papers, 9 papers with code

QuickVideo: Real-Time Long Video Understanding with System Algorithm Co-Design

1 code implementation22 May 2025 Benjamin Schneider, Dongfu Jiang, Chao Du, Tianyu Pang, Wenhu Chen

Long-video understanding has emerged as a crucial capability in real-world applications such as video surveillance, meeting summarization, educational lecture analysis, and sports broadcasting.

Meeting Summarization Video Understanding

General-Reasoner: Advancing LLM Reasoning Across All Domains

1 code implementation20 May 2025 Xueguang Ma, Qian Liu, Dongfu Jiang, Ge Zhang, Zejun Ma, Wenhu Chen

Reinforcement learning (RL) has recently demonstrated strong potential in enhancing the reasoning capabilities of large language models (LLMs).

All Math +3

ACECODER: Acing Coder RL via Automated Test-Case Synthesis

no code implementations3 Feb 2025 Huaye Zeng, Dongfu Jiang, Haozhe Wang, Ping Nie, Xiaotong Chen, Wenhu Chen

Notably, we follow the R1-style training to start from Qwen2. 5-Coder-base directly and show that our RL training can improve model on HumanEval-plus by over 25\% and MBPP-plus by 6\% for merely 80 optimization steps.

HumanEval mbpp +3

MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks

1 code implementation14 Oct 2024 Jiacheng Chen, Tianhao Liang, Sherman Siu, Zhengqing Wang, Kai Wang, YuBo Wang, Yuansheng Ni, Wang Zhu, Ziyan Jiang, Bohan Lyu, Dongfu Jiang, Xuan He, YuAn Liu, Hexiang Hu, Xiang Yue, Wenhu Chen

We evaluate a wide variety of frontier vision-language models on MEGA-Bench to understand their capabilities across these dimensions.

WildVision: Evaluating Vision-Language Models in the Wild with Human Preferences

no code implementations16 Jun 2024 Yujie Lu, Dongfu Jiang, Wenhu Chen, William Yang Wang, Yejin Choi, Bill Yuchen Lin

Recent breakthroughs in vision-language models (VLMs) emphasize the necessity of benchmarking human preferences in real-world multimodal interactions.

Benchmarking Spatial Reasoning

MANTIS: Interleaved Multi-Image Instruction Tuning

1 code implementation2 May 2024 Dongfu Jiang, Xuan He, Huaye Zeng, Cong Wei, Max Ku, Qian Liu, Wenhu Chen

We further evaluate Mantis on single-image benchmarks and demonstrate that Mantis also maintains a strong single-image performance on par with CogVLM and Emu2.

VIEScore: Towards Explainable Metrics for Conditional Image Synthesis Evaluation

1 code implementation22 Dec 2023 Max Ku, Dongfu Jiang, Cong Wei, Xiang Yue, Wenhu Chen

In the rapidly advancing field of conditional image generation research, challenges such as limited explainability lie in effectively evaluating the performance and capabilities of various models.

Conditional Image Generation General Knowledge

TIGERScore: Towards Building Explainable Metric for All Text Generation Tasks

1 code implementation1 Oct 2023 Dongfu Jiang, Yishan Li, Ge Zhang, Wenhao Huang, Bill Yuchen Lin, Wenhu Chen

To quantitatively assess our metric, we evaluate its correlation with human ratings on 5 held-in datasets, 2 held-out datasets and show that TIGERScore can achieve the open-source SoTA correlation with human ratings across these datasets and almost approaches GPT-4 evaluator.

All Text Generation

LLM-Blender: Ensembling Large Language Models with Pairwise Ranking and Generative Fusion

3 code implementations5 Jun 2023 Dongfu Jiang, Xiang Ren, Bill Yuchen Lin

We present LLM-Blender, an ensembling framework designed to attain consistently superior performance by leveraging the diverse strengths of multiple open-source large language models (LLMs).

PairReranker: Pairwise Reranking for Natural Language Generation

no code implementations20 Dec 2022 Dongfu Jiang, Bill Yuchen Lin, Xiang Ren

Pre-trained language models have been successful in natural language generation (NLG) tasks.

Machine Translation Reranking +1

Cannot find the paper you are looking for? You can Submit a new open access paper.