Search Results for author: Binjie Wang

Found 5 papers, 5 papers with code

RAGChecker: A Fine-grained Framework for Diagnosing Retrieval-Augmented Generation

1 code implementation15 Aug 2024 Dongyu Ru, Lin Qiu, Xiangkun Hu, Tianhang Zhang, Peng Shi, Shuaichen Chang, Cheng Jiayang, Cunxiang Wang, Shichao Sun, Huanyu Li, Zizhao Zhang, Binjie Wang, Jiarong Jiang, Tong He, Zhiguo Wang, PengFei Liu, Yue Zhang, Zheng Zhang

Despite Retrieval-Augmented Generation (RAG) showing promising capability in leveraging external knowledge, a comprehensive evaluation of RAG systems is still challenging due to the modular nature of RAG, evaluation of long-form responses and reliability of measurements.

RAG Retrieval

OpenResearcher: Unleashing AI for Accelerated Scientific Research

1 code implementation13 Aug 2024 Yuxiang Zheng, Shichao Sun, Lin Qiu, Dongyu Ru, Cheng Jiayang, Xuefeng Li, Jifan Lin, Binjie Wang, Yun Luo, Renjie Pan, Yang Xu, Qingkai Min, Zizhao Zhang, Yiwen Wang, Wenjie Li, PengFei Liu

The rapid growth of scientific literature imposes significant challenges for researchers endeavoring to stay updated with the latest advancements in their fields and delve into new areas.

RAG Retrieval

Halu-J: Critique-Based Hallucination Judge

1 code implementation17 Jul 2024 Binjie Wang, Steffi Chern, Ethan Chern, PengFei Liu

To address these challenges, we introduce Halu-J, a critique-based hallucination judge with 7 billion parameters.

Evidence Selection Hallucination +1

BeHonest: Benchmarking Honesty in Large Language Models

1 code implementation19 Jun 2024 Steffi Chern, Zhulin Hu, Yuqing Yang, Ethan Chern, Yuan Guo, Jiahe Jin, Binjie Wang, PengFei Liu

Building on this foundation, we designed 10 scenarios to evaluate and analyze 9 popular LLMs on the market, including both closed-source and open-source models from different model families with varied model sizes.

Benchmarking Misinformation

OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI

1 code implementation18 Jun 2024 Zhen Huang, Zengzhi Wang, Shijie Xia, Xuefeng Li, Haoyang Zou, Ruijie Xu, Run-Ze Fan, Lyumanshan Ye, Ethan Chern, Yixin Ye, Yikai Zhang, Yuqing Yang, Ting Wu, Binjie Wang, Shichao Sun, Yang Xiao, Yiyuan Li, Fan Zhou, Steffi Chern, Yiwei Qin, Yan Ma, Jiadi Su, Yixiu Liu, Yuxiang Zheng, Shaoting Zhang, Dahua Lin, Yu Qiao, PengFei Liu

We delve into the models' cognitive reasoning abilities, their performance across different modalities, and their outcomes in process-level evaluations, which are vital for tasks requiring complex reasoning with lengthy solutions.

Benchmarking scientific discovery

Cannot find the paper you are looking for? You can Submit a new open access paper.