Search Results for author: Xiang Yue

Found 34 papers, 23 papers with code

Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization

1 code implementation • 23 May 2024 • Boshi Wang, Xiang Yue, Yu Su, Huan Sun

The levels of generalization also vary across reasoning types: when faced with out-of-distribution examples, transformers fail to systematically generalize for composition but succeed for comparison.

Paper
Code

Long Context Alignment with Short Instructions and Synthesized Positions

no code implementations • 7 May 2024 • Wenhao Wu, Yizhong Wang, Yao Fu, Xiang Yue, Dawei Zhu, Sujian Li

Effectively handling instructions with extremely long context remains a challenge for Large Language Models (LLMs), typically necessitating high-quality long data and substantial computational resources.

16k Instruction Following

Paper
Add Code

MAmmoTH2: Scaling Instructions from the Web

no code implementations • 6 May 2024 • Xiang Yue, Tuney Zheng, Ge Zhang, Wenhu Chen

Notably, MAmmoTH2-7B's (Mistral) performance increases from 11% to 36. 7% on MATH and from 36% to 68. 4% on GSM8K without training on any in-domain data.

Chatbot GSM8K +1

Paper
Add Code

VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?

no code implementations • 9 Apr 2024 • Junpeng Liu, YiFan Song, Bill Yuchen Lin, Wai Lam, Graham Neubig, Yuanzhi Li, Xiang Yue

Multimodal Large Language models (MLLMs) have shown promise in web-related tasks, but evaluating their performance in the web domain remains a challenge due to the lack of comprehensive benchmarks.

Optical Character Recognition (OCR)

Paper
Add Code

MuPT: A Generative Symbolic Music Pretrained Transformer

no code implementations • 9 Apr 2024 • Xingwei Qu, Yuelin Bai, Yinghao Ma, Ziya Zhou, Ka Man Lo, Jiaheng Liu, Ruibin Yuan, Lejun Min, Xueling Liu, Tianyu Zhang, Xinrun Du, Shuyue Guo, Yiming Liang, Yizhi Li, Shangda Wu, Junting Zhou, Tianyu Zheng, Ziyang Ma, Fengze Han, Wei Xue, Gus Xia, Emmanouil Benetos, Xiang Yue, Chenghua Lin, Xu Tan, Stephen W. Huang, Wenhu Chen, Jie Fu, Ge Zhang

In this paper, we explore the application of Large Language Models (LLMs) to the pre-training of music.

Music Generation Music Modeling

Paper
Add Code

CodeEditorBench: Evaluating Code Editing Capability of Large Language Models

no code implementations • 4 Apr 2024 • Jiawei Guo, Ziming Li, Xueling Liu, Kaijing Ma, Tianyu Zheng, Zhouliang Yu, Ding Pan, Yizhi Li, Ruibo Liu, Yue Wang, Shuyue Guo, Xingwei Qu, Xiang Yue, Ge Zhang, Wenhu Chen, Jie Fu

Large Language Models (LLMs) for code are rapidly evolving, with code editing emerging as a critical capability.

Code Generation

Paper
Add Code

Long-context LLMs Struggle with Long In-context Learning

1 code implementation • 2 Apr 2024 • Tianle Li, Ge Zhang, Quy Duc Do, Xiang Yue, Wenhu Chen

Our study reveals that long context understanding and reasoning is still a challenging task for the existing LLMs.

2k In-Context Learning +1

Paper
Code

Trial and Error: Exploration-Based Trajectory Optimization for LLM Agents

1 code implementation • 4 Mar 2024 • YiFan Song, Da Yin, Xiang Yue, Jie Huang, Sujian Li, Bill Yuchen Lin

This iterative cycle of exploration and training fosters continued improvement in the agents.

Contrastive Learning

Paper
Code

StructLM: Towards Building Generalist Models for Structured Knowledge Grounding

no code implementations • 26 Feb 2024 • Alex Zhuang, Ge Zhang, Tianyu Zheng, Xinrun Du, Junjie Wang, Weiming Ren, Stephen W. Huang, Jie Fu, Xiang Yue, Wenhu Chen

Utilizing this dataset, we train a series of models, referred to as StructLM, based on the Mistral and the CodeLlama model family, ranging from 7B to 34B parameters.

Paper
Add Code

Machine Unlearning of Pre-trained Large Language Models

1 code implementation • 23 Feb 2024 • Jin Yao, Eli Chien, Minxin Du, Xinyao Niu, Tianhao Wang, Zezhou Cheng, Xiang Yue

This study investigates the concept of the `right to be forgotten' within the context of large language models (LLMs).

Machine Unlearning

Paper
Code

AttributionBench: How Hard is Automatic Attribution Evaluation?

1 code implementation • 23 Feb 2024 • Yifei Li, Xiang Yue, Zeyi Liao, Huan Sun

Modern generative search engines enhance the reliability of large language model (LLM) responses by providing cited evidence.

Binary Classification Language Modelling +1

Paper
Code

OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement

1 code implementation • 22 Feb 2024 • Tianyu Zheng, Ge Zhang, Tianhao Shen, Xueling Liu, Bill Yuchen Lin, Jie Fu, Wenhu Chen, Xiang Yue

However, open-source models often lack the execution capabilities and iterative refinement of advanced systems like the GPT-4 Code Interpreter.

Code Generation

1,428

Paper
Code

Data Engineering for Scaling Language Models to 128K Context

2 code implementations • 15 Feb 2024 • Yao Fu, Rameswar Panda, Xinyao Niu, Xiang Yue, Hannaneh Hajishirzi, Yoon Kim, Hao Peng

We demonstrate that continual pretraining of the full model on 1B-5B tokens of such data is an effective and affordable strategy for scaling the context length of language models to 128K.

4k Continual Pretraining

363

Paper
Code

VIEScore: Towards Explainable Metrics for Conditional Image Synthesis Evaluation

no code implementations • 22 Dec 2023 • Max Ku, Dongfu Jiang, Cong Wei, Xiang Yue, Wenhu Chen

We evaluate VIESCORE on seven prominent tasks in conditional image tasks and found: (1) VIESCORE (GPT4-v) achieves a high Spearman correlation of 0. 3 with human evaluations, while the human-to-human correlation is 0. 45.

Conditional Image Generation General Knowledge

Paper
Add Code

MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI

2 code implementations • 27 Nov 2023 • Xiang Yue, Yuansheng Ni, Kai Zhang, Tianyu Zheng, Ruoqi Liu, Ge Zhang, Samuel Stevens, Dongfu Jiang, Weiming Ren, Yuxuan Sun, Cong Wei, Botao Yu, Ruibin Yuan, Renliang Sun, Ming Yin, Boyuan Zheng, Zhenzhu Yang, Yibo Liu, Wenhao Huang, Huan Sun, Yu Su, Wenhu Chen

We introduce MMMU: a new benchmark designed to evaluate multimodal models on massive multi-discipline tasks demanding college-level subject knowledge and deliberate reasoning.

Complex Query Answering Logical Reasoning +1

7,338

Paper
Code

TableLlama: Towards Open Large Generalist Models for Tables

no code implementations • 15 Nov 2023 • Tianshu Zhang, Xiang Yue, Yifei Li, Huan Sun

Towards that end, we construct TableInstruct, a new dataset with a variety of realistic tables and tasks, for instruction tuning and evaluating LLMs.

Paper
Add Code

MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning

1 code implementation • 11 Sep 2023 • Xiang Yue, Xingwei Qu, Ge Zhang, Yao Fu, Wenhao Huang, Huan Sun, Yu Su, Wenhu Chen

The MAmmoTH models are trained on MathInstruct, our meticulously curated instruction tuning dataset.

Math Mathematical Reasoning

286

Paper
Code

Roll Up Your Sleeves: Working with a Collaborative and Engaging Task-Oriented Dialogue System

1 code implementation • 29 Jul 2023 • Lingbo Mo, Shijie Chen, Ziru Chen, Xiang Deng, Ashley Lewis, Sunit Singh, Samuel Stevens, Chang-You Tai, Zhen Wang, Xiang Yue, Tianshu Zhang, Yu Su, Huan Sun

We introduce TacoBot, a user-centered task-oriented digital assistant designed to guide users through complex real-world tasks with multiple steps.

Data Augmentation Dialogue Management +3

Paper
Code

Can ChatGPT Defend its Belief in Truth? Evaluating LLM Reasoning via Debate

no code implementations • 22 May 2023 • Boshi Wang, Xiang Yue, Huan Sun

Large language models (LLMs) such as ChatGPT and GPT-4 have shown impressive performance in complex reasoning tasks.

Benchmarking Math +1

Paper
Add Code

Automatic Evaluation of Attribution by Large Language Models

1 code implementation • 10 May 2023 • Xiang Yue, Boshi Wang, Ziru Chen, Kai Zhang, Yu Su, Huan Sun

We manually curate a set of test examples covering 12 domains from a generative search engine, New Bing.

Fact Checking Language Modelling +3

Paper
Code

Synthetic Text Generation with Differential Privacy: A Simple and Practical Recipe

1 code implementation • 25 Oct 2022 • Xiang Yue, Huseyin A. Inan, Xuechen Li, Girish Kumar, Julia McAnallen, Hoda Shajari, Huan Sun, David Levitan, Robert Sim

Privacy concerns have attracted increasing attention in data-driven products due to the tendency of machine learning models to memorize sensitive training data.

Language Modelling Text Generation

Paper
Code

Bootstrapping a User-Centered Task-Oriented Dialogue System

no code implementations • 11 Jul 2022 • Shijie Chen, Ziru Chen, Xiang Deng, Ashley Lewis, Lingbo Mo, Samuel Stevens, Zhen Wang, Xiang Yue, Tianshu Zhang, Yu Su, Huan Sun

We present TacoBot, a task-oriented dialogue system built for the inaugural Alexa Prize TaskBot Challenge, which assists users in completing multi-step cooking and home improvement tasks.

Data Augmentation Dialogue Management +2

Paper
Add Code

Synthetic Question Value Estimation for Domain Adaptation of Question Answering

1 code implementation • ACL 2022 • Xiang Yue, Ziyu Yao, Huan Sun

Synthesizing QA pairs with a question generator (QG) on the target domain has become a popular approach for domain adaptation of question answering (QA) models.

Domain Adaptation Question Answering

Paper
Code

C-MORE: Pretraining to Answer Open-Domain Questions by Consulting Millions of References

1 code implementation • ACL 2022 • Xiang Yue, Xiaoman Pan, Wenlin Yao, Dian Yu, Dong Yu, Jianshu Chen

And with our pretrained reader, the entire system improves by up to 4% in exact match.

Open-Domain Question Answering

Paper
Code

Differential Privacy for Text Analytics via Natural Text Sanitization

1 code implementation • Findings (ACL) 2021 • Xiang Yue, Minxin Du, Tianhao Wang, Yaliang Li, Huan Sun, Sherman S. M. Chow

The sanitized texts also contribute to our sanitization-aware pretraining and fine-tuning, enabling privacy-preserving natural language processing over the BERT language model with promising utility.

Language Modelling Privacy Preserving

Paper
Code

CliniQG4QA: Generating Diverse Questions for Domain Adaptation of Clinical Question Answering

2 code implementations • 30 Oct 2020 • Xiang Yue, Xinliang Frederick Zhang, Ziyu Yao, Simon Lin, Huan Sun

Clinical question answering (QA) aims to automatically answer questions from medical professionals based on clinical texts.

Domain Adaptation Question Answering +2

136

Paper
Code

COUGH: A Challenge Dataset and Models for COVID-19 FAQ Retrieval

1 code implementation • EMNLP 2021 • Xinliang Frederick Zhang, Heming Sun, Xiang Yue, Simon Lin, Huan Sun

For evaluation, we introduce Query Bank and Relevance Set, where the former contains 1, 236 human-paraphrased queries while the latter contains ~32 human-annotated FAQ items for each query.

16k Retrieval

Paper
Code

PHICON: Improving Generalization of Clinical Text De-identification Models via Data Augmentation

1 code implementation • EMNLP (ClinicalNLP) 2020 • Xiang Yue, Shuang Zhou

De-identification is the task of identifying protected health information (PHI) in the clinical text.

Data Augmentation De-identification

Paper
Code

Clinical Reading Comprehension: A Thorough Analysis of the emrQA Dataset

1 code implementation • ACL 2020 • Xiang Yue, Bernal Jimenez Gutierrez, Huan Sun

In this paper, we provide an in-depth analysis of this dataset and the clinical reading comprehension (CliniRC) task.

Machine Reading Comprehension Question Answering

Paper
Code

Practical Annotation Strategies for Question Answering Datasets

no code implementations • 6 Mar 2020 • Bernhard Kratzwald, Xiang Yue, Huan Sun, Stefan Feuerriegel

Here, remarkably, annotating a stratified subset with only 1. 2% of the original training set achieves 97. 7% of the performance as if the complete dataset was annotated.

Question Answering

Paper
Add Code

Towards Making the Most of Context in Neural Machine Translation

1 code implementation • 19 Feb 2020 • Zaixiang Zheng, Xiang Yue, Shu-Jian Huang, Jia-Jun Chen, Alexandra Birch

Document-level machine translation manages to outperform sentence level models by a small margin, but have failed to be widely adopted.

Document Level Machine Translation Machine Translation +3

Paper
Code

Tensor Decomposition with Relational Constraints for Predicting Multiple Types of MicroRNA-disease Associations

1 code implementation • 13 Nov 2019 • Feng Huang, Xiang Yue, Zhankun Xiong, Zhouxin Yu, Wen Zhang

To this end, we innovatively represent miRNA-disease-type triplets as a tensor and introduce Tensor Decomposition methods to solve the prediction task.

Knowledge Graphs Link Prediction +1

Paper
Code

SurfCon: Synonym Discovery on Privacy-Aware Clinical Data

1 code implementation • 21 Jun 2019 • Zhen Wang, Xiang Yue, Soheil Moosavinasab, Yungui Huang, Simon Lin, Huan Sun

To solve the problem, we propose a new framework SurfCon that leverages two important types of information in the privacy-aware clinical data, i. e., the surface form information, and the global context information for synonym discovery.

Paper
Code

Graph Embedding on Biomedical Networks: Methods, Applications, and Evaluations

4 code implementations • 12 Jun 2019 • Xiang Yue, Zhen Wang, Jingong Huang, Srinivasan Parthasarathy, Soheil Moosavinasab, Yungui Huang, Simon M. Lin, Wen Zhang, Ping Zhang, Huan Sun

Our experimental results demonstrate that the recent graph embedding methods achieve promising results and deserve more attention in the future biomedical graph analysis.

Graph Embedding Link Prediction +2

221

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.