no code implementations • 13 Apr 2025 • Yuxuan Lu, Bingsheng Yao, Hansu Gu, Jing Huang, Jessie Wang, Yang Li, Jiri Gesi, Qi He, Toby Jia-Jun Li, Dakuo Wang
Usability testing is a fundamental research method that user experience (UX) researchers use to evaluate and iterate a web design, but\textbf{ how to evaluate and iterate the usability testing study design } itself?
no code implementations • 13 Apr 2025 • Dakuo Wang, Ting-Yao Hsu, Yuxuan Lu, Limeng Cui, Yaochen Xie, William Headean, Bingsheng Yao, Akash Veeragouni, Jiapeng Liu, Sreyashi Nag, Jessie Wang
A/B testing experiment is a widely adopted method for evaluating UI/UX design decisions in modern web applications.
no code implementations • 26 Mar 2025 • Yuxuan Lu, Jing Huang, Yan Han, Bennet Bei, Yaochen Xie, Dakuo Wang, Jessie Wang, Qi He
In this work, we focus on evaluating and improving LLM's objective ``accuracy'' rather than the subjective ``believability'' in the web action generation task, leveraging a large-scale, real-world dataset collected from online shopping human actions.
1 code implementation • 18 Feb 2025 • Yuxuan Lu, Bingsheng Yao, Hansu Gu, Jing Huang, Jessie Wang, Yang Li, Jiri Gesi, Qi He, Toby Jia-Jun Li, Dakuo Wang
Usability testing is a fundamental yet challenging (e. g., inflexible to iterate the study design flaws and hard to recruit study participants) research method for user experience (UX) researchers to evaluate a web design.
no code implementations • 9 Feb 2025 • Ziqi Yang, Yuxuan Lu, Jennifer Bagdasarian, Vedant Das Swain, Ritu Agarwal, Collin Campbell, Waddah Al-Refaire, Jehan El-Bayoumi, Guodong Gao, Dakuo Wang, Bingsheng Yao, Nawar Shara
Cancer surgery is a key treatment for gastrointestinal (GI) cancers, a group of cancers that account for more than 35% of cancer-related deaths worldwide, but postoperative complications are unpredictable and can be life-threatening.
1 code implementation • 11 Nov 2024 • Shengwei Xu, Yuxuan Lu, Grant Schoenebeck, Yuqing Kong
We also present GRE-bench (Generating Review Evaluation Benchmark) which evaluates LLMs based on how well they can generate high-quality peer reviews for academic research papers.
no code implementations • 5 Aug 2024 • Yuxuan Lu, Jiahao Nie, Zhiwei He, Hongjie Gu, Xudong Lv
Current LiDAR point cloud-based 3D single object tracking (SOT) methods typically rely on point-based representation network.
1 code implementation • 23 May 2024 • Yuxuan Lu, Shengwei Xu, Yichi Zhang, Yuqing Kong, Grant Schoenebeck
We highlight the results that on the ICLR dataset, our mechanisms can differentiate three quality levels -- human-written reviews, GPT-4-generated reviews, and GPT-3. 5-generated reviews in terms of expected scores.
no code implementations • 19 Dec 2023 • Hao Chen, Lun Du, Yuxuan Lu, Qiang Fu, Xu Chen, Shi Han, Yanbin Kang, Guangming Lu, Zi Li
Online recruitment platforms typically employ Person-Job Fit models in the core service that automatically match suitable job seekers with appropriate job positions.
no code implementations • 16 Nov 2023 • Bingsheng Yao, Guiming Chen, Ruishi Zou, Yuxuan Lu, Jiachen Li, Shao Zhang, Yisi Sang, Sijia Liu, James Hendler, Dakuo Wang
While most existing works on LLM prompting techniques focus only on how to select a better set of data samples inside one single prompt input (In-Context Learning or ICL), why can not we design and leverage multiple prompts together to further improve the LLM's performance?
no code implementations • 16 Nov 2023 • Yuxuan Lu, Bingsheng Yao, Shao Zhang, Yun Wang, Peng Zhang, Tun Lu, Toby Jia-Jun Li, Dakuo Wang
Large Language Models (LLMs) have demonstrated considerable advances, and several claims have been made about their exceeding human performance.
1 code implementation • 16 Nov 2023 • Jiaju Chen, Yuxuan Lu, Shao Zhang, Bingsheng Yao, Yuanzhe Dong, Ying Xu, Yunyao Li, Qianwen Wang, Dakuo Wang, Yuling Sun
Interactive story reading is a common parent-child activity, where parents expect to teach both language skills and real-world knowledge beyond the story.
no code implementations • 17 Sep 2023 • Shao Zhang, Jianing Yu, Xuhai Xu, Changchang Yin, Yuxuan Lu, Bingsheng Yao, Melanie Tory, Lace M. Padilla, Jeffrey Caterino, Ping Zhang, Dakuo Wang
Today's AI systems for medical decision support often succeed on benchmark datasets in research papers but fail in real-world deployment.
1 code implementation • 22 May 2023 • Bingsheng Yao, Ishan Jindal, Lucian Popa, Yannis Katsis, Sayan Ghosh, Lihong He, Yuxuan Lu, Shashank Srivastava, Yunyao Li, James Hendler, Dakuo Wang
Our AL architecture leverages an explanation-generation model to produce explanations guided by human explanations, a prediction model that utilizes generated explanations toward prediction faithfully, and a novel data diversity-based AL sampling strategy that benefits from the explanation annotations.
no code implementations • 26 Jan 2023 • Yuxuan Lu, Qian Qi, Xi Chen
We develop a model of coordination and allocation of decentralized multi-sided markets, in which our theoretical analysis is promisingly optimizing the decentralized transaction packaging process at high-throughput blockchains or Web 3. 0 platforms.
1 code implementation • 26 Jun 2022 • Yuxuan Lu, Jingya Yan, Zhixuan Qi, Zhongzheng Ge, Yongping Du
Biomedical Question Answering aims to obtain an answer to the given question from the biomedical domain.
Ranked #1 on
Machine Reading Comprehension
on BIOMRC
2 code implementations • 14 Jan 2022 • Yulin Liu, Yuxuan Lu, Kartik Nayak, Fan Zhang, Luyao Zhang, Yinhong Zhao
However, a systematic evaluation of the real-world impact of TFMs is still absent.
no code implementations • 24 Feb 2021 • Jiale Chen, Yuqing Kong, Yuxuan Lu
With this assumption, we propose a new definition for uninformative feedback and correspondingly design a family of evaluation metrics, called f-variety, for group-level feedback which can 1) distinguish informative feedback and uninformative feedback (separation) even if their statistics are both uniform and 2) decrease as the ratio of uninformative respondents increases (monotonicity).
Computer Science and Game Theory