Search Results for author: Zengzhi Wang

Found 9 papers, 7 papers with code

OlympicArena Medal Ranks: Who Is the Most Intelligent AI So Far?

1 code implementation24 Jun 2024 Zhen Huang, Zengzhi Wang, Shijie Xia, PengFei Liu

In this report, we pose the following question: Who is the most intelligent AI model to date, as measured by the OlympicArena (an Olympic-level, multi-discipline, multi-modal benchmark for superintelligent AI)?

OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI

1 code implementation18 Jun 2024 Zhen Huang, Zengzhi Wang, Shijie Xia, Xuefeng Li, Haoyang Zou, Ruijie Xu, Run-Ze Fan, Lyumanshan Ye, Ethan Chern, Yixin Ye, Yikai Zhang, Yuqing Yang, Ting Wu, Binjie Wang, Shichao Sun, Yang Xiao, Yiyuan Li, Fan Zhou, Steffi Chern, Yiwei Qin, Yan Ma, Jiadi Su, Yixiu Liu, Yuxiang Zheng, Shaoting Zhang, Dahua Lin, Yu Qiao, PengFei Liu

We delve into the models' cognitive reasoning abilities, their performance across different modalities, and their outcomes in process-level evaluations, which are vital for tasks requiring complex reasoning with lengthy solutions.

Benchmarking scientific discovery

Benchmarking Benchmark Leakage in Large Language Models

1 code implementation29 Apr 2024 Ruijie Xu, Zengzhi Wang, Run-Ze Fan, PengFei Liu

By analyzing 31 LLMs under the context of mathematical reasoning, we reveal substantial instances of training even test set misuse, resulting in potentially unfair comparisons.

Benchmarking Mathematical Reasoning

Generative AI for Math: Part I -- MathPile: A Billion-Token-Scale Pretraining Corpus for Math

1 code implementation28 Dec 2023 Zengzhi Wang, Rui Xia, PengFei Liu

Our meticulous data collection and processing efforts included a complex suite of preprocessing, prefiltering, language identification, cleaning, filtering, and deduplication, ensuring the high quality of our corpus.

Language Identification Math +1

Ask Again, Then Fail: Large Language Models' Vacillations in Judgment

2 code implementations3 Oct 2023 Qiming Xie, Zengzhi Wang, Yi Feng, Rui Xia

We observe that current conversational language models often waver in their judgments when faced with follow-up questions, even if the original judgment was correct.

Negation

MEMD-ABSA: A Multi-Element Multi-Domain Dataset for Aspect-Based Sentiment Analysis

1 code implementation29 Jun 2023 Hongjie Cai, Nan Song, Zengzhi Wang, Qiming Xie, Qiankun Zhao, Ke Li, Siwei Wu, Shijie Liu, Jianfei Yu, Rui Xia

Aspect-based sentiment analysis is a long-standing research interest in the field of opinion mining, and in recent years, researchers have gradually shifted their focus from simple ABSA subtasks to end-to-end multi-element ABSA tasks.

Aspect-Based Sentiment Analysis Opinion Mining +1

Cannot find the paper you are looking for? You can Submit a new open access paper.