no code implementations • 21 Oct 2024 • Xi Xu, Wenda Xu, Siqi Ouyang, Lei LI
Simultaneous speech translation (SimulST) systems must balance translation quality with response time, making latency measurement crucial for evaluating their real-world performance.
no code implementations • 15 Oct 2024 • Wenda Xu, Rujun Han, Zifeng Wang, Long T. Le, Dhruv Madeka, Lei LI, William Yang Wang, Rishabh Agarwal, Chen-Yu Lee, Tomas Pfister
To address these limitations, we introduce Speculative Knowledge Distillation (SKD), a novel approach that leverages cooperation between student and teacher models to generate high-quality training data on-the-fly while aligning with the student's inference-time distribution.
no code implementations • 9 Oct 2024 • Juhyun Oh, Eunsu Kim, Jiseon Kim, Wenda Xu, Inha Cha, William Yang Wang, Alice Oh
Our factor level analysis reveals a substantial discrepancy between human and LLM preferences in generation tasks, whereas LLMs show strong alignment with human preferences in evaluation tasks.
no code implementations • 7 Oct 2024 • Chinmay Dandekar, Wenda Xu, Xi Xu, Siqi Ouyang, Lei LI
With the rapid advancement of machine translation research, evaluation toolkits have become essential for benchmarking system progress.
1 code implementation • 18 Jun 2024 • Wenda Xu, Jiachen Li, William Yang Wang, Lei LI
Direct alignment from preferences (DAP) has emerged as a promising paradigm for aligning large language models (LLMs) to human desiderata from pre-collected, offline preference datasets.
1 code implementation • 18 Feb 2024 • Wenda Xu, Guanglei Zhu, Xuandong Zhao, Liangming Pan, Lei LI, William Yang Wang
We discovered that such a contrary is due to LLM's bias in evaluating their own output.
no code implementations • 15 Nov 2023 • Wenda Xu, Daniel Deutsch, Mara Finkelstein, Juraj Juraska, Biao Zhang, Zhongtao Liu, William Yang Wang, Lei LI, Markus Freitag
Recent large language models (LLM) are leveraging human feedback to improve their generation quality.
1 code implementation • 6 Aug 2023 • Liangming Pan, Michael Saxon, Wenda Xu, Deepak Nathani, Xinyi Wang, William Yang Wang
Large language models (LLMs) have demonstrated remarkable performance across a wide array of NLP tasks.
2 code implementations • 23 May 2023 • Wenda Xu, Danqing Wang, Liangming Pan, Zhenqiao Song, Markus Freitag, William Yang Wang, Lei LI
By harnessing both explicit human instruction and the implicit knowledge of GPT-4, we fine-tune a text evaluation metric based on LLaMA, producing both a score for generated text and a human readable diagnostic report.
1 code implementation • 20 Dec 2022 • Yi-Lin Tuan, Alon Albalak, Wenda Xu, Michael Saxon, Connor Pryor, Lise Getoor, William Yang Wang
Despite their widespread adoption, neural conversation models have yet to exhibit natural chat capabilities with humans.
1 code implementation • 19 Dec 2022 • Wenda Xu, Xian Qian, Mingxuan Wang, Lei LI, William Yang Wang
In this paper, we propose SESCORE2, a self-supervised approach for training a model-based metric for text generation evaluation.
1 code implementation • 10 Oct 2022 • Wenda Xu, YiLin Tuan, Yujie Lu, Michael Saxon, Lei LI, William Yang Wang
Is it possible to build a general and automatic natural language generation (NLG) evaluation metric?
1 code implementation • 7 Oct 2022 • Wanrong Zhu, An Yan, Yujie Lu, Wenda Xu, Xin Eric Wang, Miguel Eckstein, William Yang Wang
Recent advances in text-to-image synthesis make it possible to visualize machine imaginations for a given context.
no code implementations • 10 Sep 2022 • Yujie Lu, Huiliang Zhang, Ping Nie, Weixi Feng, Wenda Xu, Xin Eric Wang, William Yang Wang
In this paper, we propose an Unseen Discrepancy Anticipating Vision and Language Navigation (DAVIS) that learns to generalize to unseen environments via encouraging test-time visual consistency.
no code implementations • 6 Jun 2022 • Yujie Lu, Weixi Feng, Wanrong Zhu, Wenda Xu, Xin Eric Wang, Miguel Eckstein, William Yang Wang
Procedural planning aims to implement complex high-level goals by decomposition into sequential simpler low-level steps.
no code implementations • 16 Dec 2021 • Michael Saxon, Xinyi Wang, Wenda Xu, William Yang Wang
Building natural language inference (NLI) benchmarks that are both challenging for modern techniques, and free from shortcut biases is difficult.
1 code implementation • 6 Oct 2021 • Wenda Xu, Michael Saxon, Misha Sra, William Yang Wang
This is a particularly notable issue in the medical domain, where layman are often confused by medical text online.
no code implementations • 15 Feb 2020 • Sairamvinay Vijayaraghavan, Ye Wang, Zhiyuan Guo, John Voong, Wenda Xu, Armand Nasseri, Jiaru Cai, Linda Li, Kevin Vuong, Eshan Wadhwa
This is a paper for exploring various different models aiming at developing fake news detection models and we had used certain machine learning algorithms and we had used pretrained algorithms such as TFIDF and CV and W2V as features for processing textual data.