1 code implementation • 23 May 2024 • Boshi Wang, Xiang Yue, Yu Su, Huan Sun
The levels of generalization also vary across reasoning types: when faced with out-of-distribution examples, transformers fail to systematically generalize for composition but succeed for comparison.
no code implementations • 7 May 2024 • Wenhao Wu, Yizhong Wang, Yao Fu, Xiang Yue, Dawei Zhu, Sujian Li
Effectively handling instructions with extremely long context remains a challenge for Large Language Models (LLMs), typically necessitating high-quality long data and substantial computational resources.
no code implementations • 6 May 2024 • Xiang Yue, Tuney Zheng, Ge Zhang, Wenhu Chen
Notably, MAmmoTH2-7B's (Mistral) performance increases from 11% to 36. 7% on MATH and from 36% to 68. 4% on GSM8K without training on any in-domain data.
no code implementations • 9 Apr 2024 • Junpeng Liu, YiFan Song, Bill Yuchen Lin, Wai Lam, Graham Neubig, Yuanzhi Li, Xiang Yue
Multimodal Large Language models (MLLMs) have shown promise in web-related tasks, but evaluating their performance in the web domain remains a challenge due to the lack of comprehensive benchmarks.
no code implementations • 9 Apr 2024 • Xingwei Qu, Yuelin Bai, Yinghao Ma, Ziya Zhou, Ka Man Lo, Jiaheng Liu, Ruibin Yuan, Lejun Min, Xueling Liu, Tianyu Zhang, Xinrun Du, Shuyue Guo, Yiming Liang, Yizhi Li, Shangda Wu, Junting Zhou, Tianyu Zheng, Ziyang Ma, Fengze Han, Wei Xue, Gus Xia, Emmanouil Benetos, Xiang Yue, Chenghua Lin, Xu Tan, Stephen W. Huang, Wenhu Chen, Jie Fu, Ge Zhang
In this paper, we explore the application of Large Language Models (LLMs) to the pre-training of music.
no code implementations • 4 Apr 2024 • Jiawei Guo, Ziming Li, Xueling Liu, Kaijing Ma, Tianyu Zheng, Zhouliang Yu, Ding Pan, Yizhi Li, Ruibo Liu, Yue Wang, Shuyue Guo, Xingwei Qu, Xiang Yue, Ge Zhang, Wenhu Chen, Jie Fu
Large Language Models (LLMs) for code are rapidly evolving, with code editing emerging as a critical capability.
1 code implementation • 2 Apr 2024 • Tianle Li, Ge Zhang, Quy Duc Do, Xiang Yue, Wenhu Chen
Our study reveals that long context understanding and reasoning is still a challenging task for the existing LLMs.
1 code implementation • 4 Mar 2024 • YiFan Song, Da Yin, Xiang Yue, Jie Huang, Sujian Li, Bill Yuchen Lin
This iterative cycle of exploration and training fosters continued improvement in the agents.
no code implementations • 26 Feb 2024 • Alex Zhuang, Ge Zhang, Tianyu Zheng, Xinrun Du, Junjie Wang, Weiming Ren, Stephen W. Huang, Jie Fu, Xiang Yue, Wenhu Chen
Utilizing this dataset, we train a series of models, referred to as StructLM, based on the Mistral and the CodeLlama model family, ranging from 7B to 34B parameters.
1 code implementation • 23 Feb 2024 • Jin Yao, Eli Chien, Minxin Du, Xinyao Niu, Tianhao Wang, Zezhou Cheng, Xiang Yue
This study investigates the concept of the `right to be forgotten' within the context of large language models (LLMs).
1 code implementation • 23 Feb 2024 • Yifei Li, Xiang Yue, Zeyi Liao, Huan Sun
Modern generative search engines enhance the reliability of large language model (LLM) responses by providing cited evidence.
1 code implementation • 22 Feb 2024 • Tianyu Zheng, Ge Zhang, Tianhao Shen, Xueling Liu, Bill Yuchen Lin, Jie Fu, Wenhu Chen, Xiang Yue
However, open-source models often lack the execution capabilities and iterative refinement of advanced systems like the GPT-4 Code Interpreter.
2 code implementations • 15 Feb 2024 • Yao Fu, Rameswar Panda, Xinyao Niu, Xiang Yue, Hannaneh Hajishirzi, Yoon Kim, Hao Peng
We demonstrate that continual pretraining of the full model on 1B-5B tokens of such data is an effective and affordable strategy for scaling the context length of language models to 128K.
no code implementations • 22 Dec 2023 • Max Ku, Dongfu Jiang, Cong Wei, Xiang Yue, Wenhu Chen
We evaluate VIESCORE on seven prominent tasks in conditional image tasks and found: (1) VIESCORE (GPT4-v) achieves a high Spearman correlation of 0. 3 with human evaluations, while the human-to-human correlation is 0. 45.
2 code implementations • 27 Nov 2023 • Xiang Yue, Yuansheng Ni, Kai Zhang, Tianyu Zheng, Ruoqi Liu, Ge Zhang, Samuel Stevens, Dongfu Jiang, Weiming Ren, Yuxuan Sun, Cong Wei, Botao Yu, Ruibin Yuan, Renliang Sun, Ming Yin, Boyuan Zheng, Zhenzhu Yang, Yibo Liu, Wenhao Huang, Huan Sun, Yu Su, Wenhu Chen
We introduce MMMU: a new benchmark designed to evaluate multimodal models on massive multi-discipline tasks demanding college-level subject knowledge and deliberate reasoning.
no code implementations • 15 Nov 2023 • Tianshu Zhang, Xiang Yue, Yifei Li, Huan Sun
Towards that end, we construct TableInstruct, a new dataset with a variety of realistic tables and tasks, for instruction tuning and evaluating LLMs.
1 code implementation • 11 Sep 2023 • Xiang Yue, Xingwei Qu, Ge Zhang, Yao Fu, Wenhao Huang, Huan Sun, Yu Su, Wenhu Chen
The MAmmoTH models are trained on MathInstruct, our meticulously curated instruction tuning dataset.
1 code implementation • 29 Jul 2023 • Lingbo Mo, Shijie Chen, Ziru Chen, Xiang Deng, Ashley Lewis, Sunit Singh, Samuel Stevens, Chang-You Tai, Zhen Wang, Xiang Yue, Tianshu Zhang, Yu Su, Huan Sun
We introduce TacoBot, a user-centered task-oriented digital assistant designed to guide users through complex real-world tasks with multiple steps.
no code implementations • 22 May 2023 • Boshi Wang, Xiang Yue, Huan Sun
Large language models (LLMs) such as ChatGPT and GPT-4 have shown impressive performance in complex reasoning tasks.
1 code implementation • 10 May 2023 • Xiang Yue, Boshi Wang, Ziru Chen, Kai Zhang, Yu Su, Huan Sun
We manually curate a set of test examples covering 12 domains from a generative search engine, New Bing.
1 code implementation • 25 Oct 2022 • Xiang Yue, Huseyin A. Inan, Xuechen Li, Girish Kumar, Julia McAnallen, Hoda Shajari, Huan Sun, David Levitan, Robert Sim
Privacy concerns have attracted increasing attention in data-driven products due to the tendency of machine learning models to memorize sensitive training data.
no code implementations • 11 Jul 2022 • Shijie Chen, Ziru Chen, Xiang Deng, Ashley Lewis, Lingbo Mo, Samuel Stevens, Zhen Wang, Xiang Yue, Tianshu Zhang, Yu Su, Huan Sun
We present TacoBot, a task-oriented dialogue system built for the inaugural Alexa Prize TaskBot Challenge, which assists users in completing multi-step cooking and home improvement tasks.
1 code implementation • ACL 2022 • Xiang Yue, Ziyu Yao, Huan Sun
Synthesizing QA pairs with a question generator (QG) on the target domain has become a popular approach for domain adaptation of question answering (QA) models.
1 code implementation • ACL 2022 • Xiang Yue, Xiaoman Pan, Wenlin Yao, Dian Yu, Dong Yu, Jianshu Chen
And with our pretrained reader, the entire system improves by up to 4% in exact match.
1 code implementation • Findings (ACL) 2021 • Xiang Yue, Minxin Du, Tianhao Wang, Yaliang Li, Huan Sun, Sherman S. M. Chow
The sanitized texts also contribute to our sanitization-aware pretraining and fine-tuning, enabling privacy-preserving natural language processing over the BERT language model with promising utility.
2 code implementations • 30 Oct 2020 • Xiang Yue, Xinliang Frederick Zhang, Ziyu Yao, Simon Lin, Huan Sun
Clinical question answering (QA) aims to automatically answer questions from medical professionals based on clinical texts.
1 code implementation • EMNLP 2021 • Xinliang Frederick Zhang, Heming Sun, Xiang Yue, Simon Lin, Huan Sun
For evaluation, we introduce Query Bank and Relevance Set, where the former contains 1, 236 human-paraphrased queries while the latter contains ~32 human-annotated FAQ items for each query.
1 code implementation • EMNLP (ClinicalNLP) 2020 • Xiang Yue, Shuang Zhou
De-identification is the task of identifying protected health information (PHI) in the clinical text.
1 code implementation • ACL 2020 • Xiang Yue, Bernal Jimenez Gutierrez, Huan Sun
In this paper, we provide an in-depth analysis of this dataset and the clinical reading comprehension (CliniRC) task.
no code implementations • 6 Mar 2020 • Bernhard Kratzwald, Xiang Yue, Huan Sun, Stefan Feuerriegel
Here, remarkably, annotating a stratified subset with only 1. 2% of the original training set achieves 97. 7% of the performance as if the complete dataset was annotated.
1 code implementation • 19 Feb 2020 • Zaixiang Zheng, Xiang Yue, Shu-Jian Huang, Jia-Jun Chen, Alexandra Birch
Document-level machine translation manages to outperform sentence level models by a small margin, but have failed to be widely adopted.
1 code implementation • 13 Nov 2019 • Feng Huang, Xiang Yue, Zhankun Xiong, Zhouxin Yu, Wen Zhang
To this end, we innovatively represent miRNA-disease-type triplets as a tensor and introduce Tensor Decomposition methods to solve the prediction task.
1 code implementation • 21 Jun 2019 • Zhen Wang, Xiang Yue, Soheil Moosavinasab, Yungui Huang, Simon Lin, Huan Sun
To solve the problem, we propose a new framework SurfCon that leverages two important types of information in the privacy-aware clinical data, i. e., the surface form information, and the global context information for synonym discovery.
4 code implementations • 12 Jun 2019 • Xiang Yue, Zhen Wang, Jingong Huang, Srinivasan Parthasarathy, Soheil Moosavinasab, Yungui Huang, Simon M. Lin, Wen Zhang, Ping Zhang, Huan Sun
Our experimental results demonstrate that the recent graph embedding methods achieve promising results and deserve more attention in the future biomedical graph analysis.