no code implementations • 27 May 2025 • Alfin Wijaya Rahardja, Junwei Liu, Weitong Chen, Zhenpeng Chen, Yiling Lou
These results underscore the unique challenges of maintaining agent systems compared to traditional software, highlighting the need for further research to develop advanced SE agents for resolving agent issues.
1 code implementation • 26 May 2025 • Ying Xiao, Jie Huang, Ruijuan He, Jing Xiao, Mohammad Reza Mousavi, Yepang Liu, Kezhi Li, Zhenpeng Chen, Jie M. Zhang
Large language models (LLMs) are reaching expert-level accuracy on medical diagnosis questions, yet their mistakes and the biases behind them pose life-critical risks.
no code implementations • 20 Feb 2025 • Weisong Sun, Yuchen Chen, Mengzhe Yuan, Chunrong Fang, Zhenpeng Chen, Chong Wang, Yang Liu, Baowen Xu, Zhenyu Chen
Finally, KillBadCode purifies the poisoned data by removing all poisoned samples containing the identified trigger tokens.
no code implementations • 11 Dec 2024 • Zhenpeng Chen, Xinyue Li, Jie M. Zhang, Federica Sarro, Yang Liu
Intersectional fairness is a critical requirement for Machine Learning (ML) software, demanding fairness across subgroups defined by multiple protected attributes.
no code implementations • 1 Nov 2024 • Xinyue Li, Zhenpeng Chen, Jie M. Zhang, Yiling Lou, Tianlin Li, Weisong Sun, Yang Liu, Xuanzhe Liu
Our benchmark reveals 72, 716 biased responses across the studied LLMs, with individual models yielding between 7, 754 and 16, 963 biased responses, underscoring the prevalence of bias in role-playing contexts.
1 code implementation • 16 Oct 2024 • Yaoqi Guo, Zhenpeng Chen, Jie M. Zhang, Yang Liu, Yun Ma
Code generation, the automatic creation of source code from natural language descriptions, has garnered significant attention due to its potential to streamline software development.
1 code implementation • 4 Sep 2024 • Junwei Liu, Kaixin Wang, Yixuan Chen, Xin Peng, Zhenpeng Chen, Lingming Zhang, Yiling Lou
The recent advance in Large Language Models (LLMs) has shaped a new paradigm of AI agents, i. e., LLM-based agents.
1 code implementation • 16 Apr 2024 • Kaibo Liu, Zhenpeng Chen, Yiyang Liu, Jie M. Zhang, Mark Harman, Yudong Han, Yun Ma, Yihong Dong, Ge Li, Gang Huang
To address this problem, we propose TrickCatcher, an LLM-powered approach to generating test cases for uncovering bugs in plausible programs.
no code implementations • 8 Feb 2024 • QiPeng Wang, Shiqi Jiang, Zhenpeng Chen, Xu Cao, Yuanchun Li, Aoyu Li, Yun Ma, Ting Cao, Xuanzhe Liu
The gap on mobile CPU and mobile GPU is 15. 8 times and 7. 8 times, respectively.
1 code implementation • 26 Dec 2023 • Tingting Xu, Yun Miao, Chunrong Fang, Hanwei Qian, Xia Feng, Zhenpeng Chen, Chong Wang, Jian Zhang, Weisong Sun, Zhenyu Chen, Yang Liu
Our comprehensive experimental results show that PromptCS significantly outperforms instruction prompting schemes (including zero-shot learning and few-shot learning) on all four widely used metrics, and is comparable to the task-oriented fine-tuning scheme.
2 code implementations • 5 Aug 2023 • Xinyue Li, Zhenpeng Chen, Jie M. Zhang, Federica Sarro, Ying Zhang, Xuanzhe Liu
This paper conducts fairness testing of automated pedestrian detection, a crucial but under-explored issue in autonomous driving systems.
1 code implementation • 25 Jul 2023 • Zhenpeng Chen, Jie M. Zhang, Federica Sarro, Mark Harman
Existing research mostly improves the fairness of Machine Learning (ML) software regarding a single protected attribute at a time, but this is unrealistic given that many users have multiple protected attributes.
no code implementations • 14 Jul 2022 • Max Hort, Zhenpeng Chen, Jie M. Zhang, Mark Harman, Federica Sarro
How many datasets are used for evaluating bias mitigation methods?
2 code implementations • 7 Jul 2022 • Zhenpeng Chen, Jie M. Zhang, Federica Sarro, Mark Harman
We find that (1) the bias mitigation methods significantly decrease ML performance in 53% of the studied scenarios (ranging between 42%~66% according to different ML performance metrics); (2) the bias mitigation methods significantly improve fairness measured by the 4 used metrics in 46% of all the scenarios (ranging between 24%~59% according to different fairness metrics); (3) the bias mitigation methods even lead to decrease in both fairness and ML performance in 25% of the scenarios; (4) the effectiveness of the bias mitigation methods depends on tasks, models, the choice of protected attributes, and the set of metrics used to assess fairness and ML performance; (5) there is no bias mitigation method that can achieve the best trade-off in all the scenarios.
no code implementations • 19 Jul 2021 • Zhenpeng Chen, Yuan Li
Among 2D convolutional networks on point clouds, point-based approaches consume point clouds of fixed size directly.
no code implementations • 10 Feb 2021 • Xuan Lu, Wei Ai, Zhenpeng Chen, Yanbin Cao, Qiaozhu Mei
This paper studies how emojis, as non-verbal cues in online communications, can be used for such purposes and how the emotional signals in emoji usage can be used to predict future behavior of workers.
1 code implementation • 13 Jan 2021 • Zhenpeng Chen, Huihan Yao, Yiling Lou, Yanbin Cao, Yuanqiang Liu, Haoyu Wang, Xuanzhe Liu
In contrast, faults related to the deployment of DL models on mobile devices (named as deployment faults of mobile DL apps) have not been well studied.
no code implementations • 12 Jun 2020 • Chengxu Yang, Qipeng Wang, Mengwei Xu, Zhenpeng Chen, Kaigui Bian, Yunxin Liu, Xuanzhe Liu
Based on the data and the platform, we conduct extensive experiments to compare the performance of state-of-the-art FL algorithms under heterogeneity-aware and heterogeneity-unaware settings.
no code implementations • 2 May 2020 • Zhenpeng Chen, Yanbin Cao, Yuanqiang Liu, Haoyu Wang, Tao Xie, Xuanzhe Liu
Deep learning (DL) becomes increasingly pervasive, being used in a wide range of software applications.
Software Engineering
1 code implementation • 4 Jul 2019 • Zhenpeng Chen, Yanbin Cao, Xuan Lu, Qiaozhu Mei, Xuanzhe Liu
However, commonly used out-of-the-box sentiment analysis tools cannot obtain reliable results on SE tasks and the misunderstanding of technical jargon is demonstrated to be the main reason.
1 code implementation • 12 Dec 2018 • Xuan Lu, Yanbin Cao, Zhenpeng Chen, Xuanzhe Liu
We find that emojis are used by a considerable proportion of GitHub users.
Computers and Society Software Engineering
1 code implementation • 7 Jun 2018 • Zhenpeng Chen, Sheng Shen, Ziniu Hu, Xuan Lu, Qiaozhu Mei, Xuanzhe Liu
To tackle this problem, cross-lingual sentiment classification approaches aim to transfer knowledge learned from one language that has abundant labeled examples (i. e., the source language, usually English) to another language with fewer labels (i. e., the target language).