Search Results for author: Xuanming Zhang

Found 11 papers, 6 papers with code

EvoCodeBench: An Evolving Code Generation Benchmark with Domain-Specific Evaluations

no code implementations30 Oct 2024 Jia Li, Ge Li, Xuanming Zhang, YunFei Zhao, Yihong Dong, Zhi Jin, Binhua Li, Fei Huang, Yongbin Li

These evaluations help practitioners select superior LLMs in specific domains and discover the shortcomings of existing LLMs.

Code Generation Fairness

Seeker: Enhancing Exception Handling in Code with LLM-based Multi-Agent Approach

1 code implementation9 Oct 2024 Xuanming Zhang, Yuxuan Chen, Yuan Yuan, Minlie Huang

In real world software development, improper or missing exception handling can severely impact the robustness and reliability of code.

DECOR: Improving Coherence in L2 English Writing with a Novel Benchmark for Incoherence Detection, Reasoning, and Rewriting

1 code implementation28 Jun 2024 Xuanming Zhang, Anthony Diaz, Zixun Chen, Qingyang Wu, Kun Qian, Erik Voss, Zhou Yu

To bridge this gap, we introduce DECOR, a novel benchmark that includes expert annotations for detecting incoherence in L2 English writing, identifying the underlying reasons, and rewriting the incoherent sentences.

Automated Writing Evaluation

VarBench: Robust Language Model Benchmarking Through Dynamic Variable Perturbation

1 code implementation25 Jun 2024 Kun Qian, Shunji Wan, Claudia Tang, Youzhi Wang, Xuanming Zhang, Maximillian Chen, Zhou Yu

As large language models achieve impressive scores on traditional benchmarks, an increasing number of researchers are becoming concerned about benchmark data leakage during pre-training, commonly known as the data contamination problem.

ARC Benchmarking +4

EvoCodeBench: An Evolving Code Generation Benchmark Aligned with Real-World Code Repositories

1 code implementation31 Mar 2024 Jia Li, Ge Li, Xuanming Zhang, Yihong Dong, Zhi Jin

Existing benchmarks demonstrate poor alignment with real-world code repositories and are insufficient to evaluate the coding abilities of LLMs.

Code Generation

ProLex: A Benchmark for Language Proficiency-oriented Lexical Substitution

1 code implementation21 Jan 2024 Xuanming Zhang, Zixun Chen, Zhou Yu

To bridge this gap, we propose a new task, language proficiency-oriented lexical substitution.

Sentence

DevEval: Evaluating Code Generation in Practical Software Projects

no code implementations12 Jan 2024 Jia Li, Ge Li, YunFei Zhao, Yongmin Li, Zhi Jin, Hao Zhu, Huanyu Liu, Kaibo Liu, Lecheng Wang, Zheng Fang, Lanshen Wang, Jiazheng Ding, Xuanming Zhang, Yihong Dong, Yuqi Zhu, Bin Gu, Mengfei Yang

Compared to previous benchmarks, DevEval aligns to practical projects in multiple dimensions, e. g., real program distributions, sufficient dependencies, and enough-scale project contexts.

Code Generation

Multicollinearity Resolution Based on Machine Learning: A Case Study of Carbon Emissions in Sichuan Province

no code implementations3 Sep 2023 Xuanming Zhang, Xiaoxue Wang, Yonghang Chen

Penalized regression models were then applied for their advantages in overfitting control, high-dimensional data processing, and feature selection - well-suited for the complex energy data.

Decision Making feature selection +2

Aspect-Based Sentiment Analysis as Fine-Grained Opinion Mining

no code implementations LREC 2020 Gerardo Ocampo Diaz, Xuanming Zhang, Vincent Ng

We show how the general fine-grained opinion mining concepts of opinion target and opinion expression are related to aspect-based sentiment analysis (ABSA) and discuss their benefits for resource creation over popular ABSA annotation schemes.

Aspect-Based Sentiment Analysis Aspect-Based Sentiment Analysis (ABSA) +1

Cannot find the paper you are looking for? You can Submit a new open access paper.