no code implementations • 11 Sep 2023 • Xiang Yue, Xingwei Qu, Ge Zhang, Yao Fu, Wenhao Huang, Huan Sun, Yu Su, Wenhu Chen
The MAmmoTH models are trained on MathInstruct, our meticulously curated instruction tuning dataset.
1 code implementation • 29 Jul 2023 • Lingbo Mo, Shijie Chen, Ziru Chen, Xiang Deng, Ashley Lewis, Sunit Singh, Samuel Stevens, Chang-You Tai, Zhen Wang, Xiang Yue, Tianshu Zhang, Yu Su, Huan Sun
We introduce TacoBot, a user-centered task-oriented digital assistant designed to guide users through complex real-world tasks with multiple steps.
no code implementations • 22 May 2023 • Boshi Wang, Xiang Yue, Huan Sun
We explore testing the reasoning ability of large language models (LLMs), such as ChatGPT, by engaging with them in a debate-like conversation that probes deeper into their understanding of the subject.
1 code implementation • 10 May 2023 • Xiang Yue, Boshi Wang, Kai Zhang, Ziru Chen, Yu Su, Huan Sun
To facilitate the evaluation, we manually curate a set of test examples covering 12 domains from a generative search engine, New Bing.
1 code implementation • 25 Oct 2022 • Xiang Yue, Huseyin A. Inan, Xuechen Li, Girish Kumar, Julia McAnallen, Hoda Shajari, Huan Sun, David Levitan, Robert Sim
Privacy concerns have attracted increasing attention in data-driven products due to the tendency of machine learning models to memorize sensitive training data.
no code implementations • 11 Jul 2022 • Shijie Chen, Ziru Chen, Xiang Deng, Ashley Lewis, Lingbo Mo, Samuel Stevens, Zhen Wang, Xiang Yue, Tianshu Zhang, Yu Su, Huan Sun
We present TacoBot, a task-oriented dialogue system built for the inaugural Alexa Prize TaskBot Challenge, which assists users in completing multi-step cooking and home improvement tasks.
1 code implementation • ACL 2022 • Xiang Yue, Xiaoman Pan, Wenlin Yao, Dian Yu, Dong Yu, Jianshu Chen
And with our pretrained reader, the entire system improves by up to 4% in exact match.
1 code implementation • ACL 2022 • Xiang Yue, Ziyu Yao, Huan Sun
Synthesizing QA pairs with a question generator (QG) on the target domain has become a popular approach for domain adaptation of question answering (QA) models.
1 code implementation • Findings (ACL) 2021 • Xiang Yue, Minxin Du, Tianhao Wang, Yaliang Li, Huan Sun, Sherman S. M. Chow
The sanitized texts also contribute to our sanitization-aware pretraining and fine-tuning, enabling privacy-preserving natural language processing over the BERT language model with promising utility.
2 code implementations • 30 Oct 2020 • Xiang Yue, Xinliang Frederick Zhang, Ziyu Yao, Simon Lin, Huan Sun
Clinical question answering (QA) aims to automatically answer questions from medical professionals based on clinical texts.
1 code implementation • EMNLP 2021 • Xinliang Frederick Zhang, Heming Sun, Xiang Yue, Simon Lin, Huan Sun
For evaluation, we introduce Query Bank and Relevance Set, where the former contains 1, 236 human-paraphrased queries while the latter contains ~32 human-annotated FAQ items for each query.
1 code implementation • EMNLP (ClinicalNLP) 2020 • Xiang Yue, Shuang Zhou
De-identification is the task of identifying protected health information (PHI) in the clinical text.
1 code implementation • ACL 2020 • Xiang Yue, Bernal Jimenez Gutierrez, Huan Sun
In this paper, we provide an in-depth analysis of this dataset and the clinical reading comprehension (CliniRC) task.
no code implementations • 6 Mar 2020 • Bernhard Kratzwald, Xiang Yue, Huan Sun, Stefan Feuerriegel
Here, remarkably, annotating a stratified subset with only 1. 2% of the original training set achieves 97. 7% of the performance as if the complete dataset was annotated.
1 code implementation • 19 Feb 2020 • Zaixiang Zheng, Xiang Yue, Shu-Jian Huang, Jia-Jun Chen, Alexandra Birch
Document-level machine translation manages to outperform sentence level models by a small margin, but have failed to be widely adopted.
1 code implementation • 13 Nov 2019 • Feng Huang, Xiang Yue, Zhankun Xiong, Zhouxin Yu, Wen Zhang
To this end, we innovatively represent miRNA-disease-type triplets as a tensor and introduce Tensor Decomposition methods to solve the prediction task.
1 code implementation • 21 Jun 2019 • Zhen Wang, Xiang Yue, Soheil Moosavinasab, Yungui Huang, Simon Lin, Huan Sun
To solve the problem, we propose a new framework SurfCon that leverages two important types of information in the privacy-aware clinical data, i. e., the surface form information, and the global context information for synonym discovery.
4 code implementations • 12 Jun 2019 • Xiang Yue, Zhen Wang, Jingong Huang, Srinivasan Parthasarathy, Soheil Moosavinasab, Yungui Huang, Simon M. Lin, Wen Zhang, Ping Zhang, Huan Sun
Our experimental results demonstrate that the recent graph embedding methods achieve promising results and deserve more attention in the future biomedical graph analysis.