Search Results for author: Xinyu Zhu

Found 20 papers, 12 papers with code

Do LLM Evaluators Prefer Themselves for a Reason?

no code implementations4 Apr 2025 Wei-Lin Chen, Zhepei Wei, Xinyu Zhu, Shi Feng, Yu Meng

Large language models (LLMs) are increasingly used as automatic evaluators in applications such as benchmarking, reward modeling, and self-refinement.

Benchmarking Code Generation +1

Self-Evolving Multi-Agent Collaboration Networks for Software Development

no code implementations22 Oct 2024 Yue Hu, Yuzhu Cai, Yaxin Du, Xinyu Zhu, Xiangrui Liu, Zijie Yu, Yuchen Hou, Shuo Tang, Siheng Chen

To extend coding capabilities beyond function-level tasks to more challenging software-level development, we further propose rSDE-Bench, a requirement-oriented software development benchmark, which features complex and diverse software requirements along with automatic evaluation of requirement correctness.

HumanEval

A Survey on the Honesty of Large Language Models

2 code implementations27 Sep 2024 Siheng Li, Cheng Yang, Taiqiang Wu, Chufan Shi, Yuji Zhang, Xinyu Zhu, Zesen Cheng, Deng Cai, Mo Yu, Lemao Liu, Jie zhou, Yujiu Yang, Ngai Wong, Xixin Wu, Wai Lam

Honesty is a fundamental principle for aligning large language models (LLMs) with human values, requiring these models to recognize what they know and don't know and be able to faithfully express their knowledge.

Survey

Lifelong Histopathology Whole Slide Image Retrieval via Distance Consistency Rehearsal

1 code implementation11 Jul 2024 Xinyu Zhu, Zhiguo Jiang, Kun Wu, Jun Shi, Yushan Zheng

Content-based histopathological image retrieval (CBHIR) has gained attention in recent years, offering the capability to return histopathology images that are content-wise similar to the query one from an established database.

Image Retrieval Retrieval

HoLLMwood: Unleashing the Creativity of Large Language Models in Screenwriting via Role Playing

no code implementations17 Jun 2024 Jing Chen, Xinyu Zhu, Cheng Yang, Chufan Shi, Yadong Xi, Yuxiang Zhang, Junjie Wang, Jiashu Pu, Rongsheng Zhang, Yujiu Yang, Tian Feng

Generative AI has demonstrated unprecedented creativity in the field of computer vision, yet such phenomena have not been observed in natural language processing.

ChartMimic: Evaluating LMM's Cross-Modal Reasoning Capability via Chart-to-Code Generation

1 code implementation14 Jun 2024 Cheng Yang, Chufan Shi, Yaxin Liu, Bo Shui, Junjie Wang, Mohan Jing, Linran Xu, Xinyu Zhu, Siheng Li, Yuxiang Zhang, Gongye Liu, Xiaomei Nie, Deng Cai, Yujiu Yang

We introduce a new benchmark, ChartMimic, aimed at assessing the visually-grounded code generation capabilities of large multimodal models (LMMs).

Code Generation

FedLLM-Bench: Realistic Benchmarks for Federated Learning of Large Language Models

3 code implementations7 Jun 2024 Rui Ye, Rui Ge, Xinyu Zhu, Jingyi Chai, Yaxin Du, Yang Liu, Yanfeng Wang, Siheng Chen

Addressing this, we propose FedLLM-Bench, which involves 8 training methods, 4 training datasets, and 6 evaluation metrics, to offer a comprehensive testbed for the FedLLM community.

Federated Learning

Adaptive Fair Representation Learning for Personalized Fairness in Recommendations via Information Alignment

1 code implementation11 Apr 2024 Xinyu Zhu, Lilin Zhang, Ning Yang

The existing works often treat a fairness requirement, represented as a collection of sensitive attributes, as a hyper-parameter, and pursue extreme fairness by completely removing information of sensitive attributes from the learned fair embedding, which suffer from two challenges: huge training cost incurred by the explosion of attribute combinations, and the suboptimal trade-off between fairness and accuracy.

Attribute Fairness +1

Federated Learning Empowered by Generative Content

no code implementations10 Dec 2023 Rui Ye, Xinyu Zhu, Jingyi Chai, Siheng Chen, Yanfeng Wang

In this paper, we propose a novel FL framework termed FedGC, designed to mitigate data heterogeneity issues by diversifying private data with generative content.

Diversity Federated Learning +1

AutoConv: Automatically Generating Information-seeking Conversations with Large Language Models

no code implementations12 Aug 2023 Siheng Li, Cheng Yang, Yichun Yin, Xinyu Zhu, Zesen Cheng, Lifeng Shang, Xin Jiang, Qun Liu, Yujiu Yang

Information-seeking conversation, which aims to help users gather information through conversation, has achieved great progress in recent years.

Few-Shot Learning Language Modeling +1

Question Answering as Programming for Solving Time-Sensitive Questions

1 code implementation23 May 2023 Xinyu Zhu, Cheng Yang, Bei Chen, Siheng Li, Jian-Guang Lou, Yujiu Yang

Question answering plays a pivotal role in human daily life because it involves our acquisition of knowledge about the world.

 Ranked #1 on Question Answering on TempQuestions (F1 metric)

Natural Language Understanding Question Answering

NER-to-MRC: Named-Entity Recognition Completely Solving as Machine Reading Comprehension

no code implementations6 May 2023 Yuxiang Zhang, Junjie Wang, Xinyu Zhu, Tetsuya Sakai, Hayato Yamana

Named-entity recognition (NER) detects texts with predefined semantic labels and is an essential building block for natural language processing (NLP).

Machine Reading Comprehension named-entity-recognition +2

Solving Math Word Problems via Cooperative Reasoning induced Language Models

1 code implementation28 Oct 2022 Xinyu Zhu, Junjie Wang, Lin Zhang, Yuxiang Zhang, Ruyi Gan, Jiaxing Zhang, Yujiu Yang

This inspires us to develop a cooperative reasoning-induced PLM for solving MWPs, called Cooperative Reasoning (CoRe), resulting in a human-like reasoning architecture with system 1 as the generator and system 2 as the verifier.

Arithmetic Reasoning Math

Zero-Shot Learners for Natural Language Understanding via a Unified Multiple Choice Perspective

1 code implementation16 Oct 2022 Ping Yang, Junjie Wang, Ruyi Gan, Xinyu Zhu, Lin Zhang, Ziwei Wu, Xinyu Gao, Jiaxing Zhang, Tetsuya Sakai

We propose a new paradigm for zero-shot learners that is format agnostic, i. e., it is compatible with any format and applicable to a list of language tasks, such as text classification, commonsense reasoning, coreference resolution, and sentiment analysis.

Multiple-choice Natural Language Inference +4

Molecular Substructure-Aware Network for Drug-Drug Interaction Prediction

1 code implementation24 Aug 2022 Xinyu Zhu, Yongliang Shen, Weiming Lu

Concomitant administration of drugs can cause drug-drug interactions (DDIs).

Cannot find the paper you are looking for? You can Submit a new open access paper.