Search Results for author: Yilun Zhao

Found 28 papers, 17 papers with code

FinMath: Injecting a Tree-structured Solver for Question Answering over Financial Reports

no code implementations LREC 2022 Chenying Li, Wenbo Ye, Yilun Zhao

This paper proposes a new framework named FinMath, which improves the model’s numerical reasoning capacity by injecting a tree-structured neural model to perform multi-step numerical reasoning.

Question Answering

MIMIR: A Streamlined Platform for Personalized Agent Tuning in Domain Expertise

no code implementations3 Apr 2024 Chunyuan Deng, Xiangru Tang, Yilun Zhao, Hanming Wang, Haoran Wang, Wangchunshu Zhou, Arman Cohan, Mark Gerstein

Recently, large language models (LLMs) have evolved into interactive agents, proficient in planning, tool use, and task execution across a wide variety of tasks.

Prioritizing Safeguarding Over Autonomy: Risks of LLM Agents for Science

no code implementations6 Feb 2024 Xiangru Tang, Qiao Jin, Kunlun Zhu, Tongxin Yuan, Yichi Zhang, Wangchunshu Zhou, Meng Qu, Yilun Zhao, Jian Tang, Zhuosheng Zhang, Arman Cohan, Zhiyong Lu, Mark Gerstein

Intelligent agents powered by large language models (LLMs) have demonstrated substantial promise in autonomously conducting experiments and facilitating scientific discoveries across various disciplines.

Investigating Data Contamination in Modern Benchmarks for Large Language Models

no code implementations16 Nov 2023 Chunyuan Deng, Yilun Zhao, Xiangru Tang, Mark Gerstein, Arman Cohan

Recent observations have underscored a disparity between the inflated benchmark scores and the actual performance of LLMs, raising concerns about potential contamination of evaluation benchmarks.

Common Sense Reasoning Multiple-choice +1

On Evaluating the Integration of Reasoning and Action in LLM Agents with Database Question Answering

no code implementations16 Nov 2023 Linyong Nan, Ellen Zhang, Weijin Zou, Yilun Zhao, Wenfei Zhou, Arman Cohan

A key discovery is the identification of two primary bottlenecks hindering effective interaction: the capacity for planning and the ability to generate multiple SQL queries.

Question Answering Retrieval

KnowledgeMath: Knowledge-Intensive Math Word Problem Solving in Finance Domains

1 code implementation16 Nov 2023 Yilun Zhao, Hongjun Liu, Yitao Long, Rui Zhang, Chen Zhao, Arman Cohan

We introduce KnowledgeMath, a novel benchmark designed to evaluate LLMs' capabilities in applying financial knowledge to solve complex math word problems.

Math Math Word Problem Solving +1

DocMath-Eval: Evaluating Numerical Reasoning Capabilities of LLMs in Understanding Long Documents with Tabular Data

no code implementations16 Nov 2023 Yilun Zhao, Yitao Long, Hongjun Liu, Linyong Nan, Lyuhao Chen, Ryo Kamoi, Yixin Liu, Xiangru Tang, Rui Zhang, Arman Cohan

This paper introduces DocMath-Eval, a comprehensive benchmark specifically designed to evaluate the numerical reasoning and problem-solving capabilities of LLMs in the context of understanding and analyzing financial documents containing both text and tables.

Math

MedAgents: Large Language Models as Collaborators for Zero-shot Medical Reasoning

1 code implementation16 Nov 2023 Xiangru Tang, Anni Zou, Zhuosheng Zhang, Ziming Li, Yilun Zhao, Xingyao Zhang, Arman Cohan, Mark Gerstein

Large language models (LLMs), despite their remarkable progress across various general domains, encounter significant barriers in medicine and healthcare.

Benchmarking Generation and Evaluation Capabilities of Large Language Models for Instruction Controllable Summarization

1 code implementation15 Nov 2023 Yixin Liu, Alexander R. Fabbri, Jiawen Chen, Yilun Zhao, Simeng Han, Shafiq Joty, PengFei Liu, Dragomir Radev, Chien-Sheng Wu, Arman Cohan

Our study reveals that instruction controllable text summarization remains a challenging task for LLMs, since (1) all LLMs evaluated still make factual and other types of errors in their summaries; (2) all LLM-based evaluation methods cannot achieve a strong alignment with human annotators when judging the quality of candidate summaries; (3) different LLMs show large performance gaps in summary generation and evaluation.

Benchmarking Text Summarization

L2CEval: Evaluating Language-to-Code Generation Capabilities of Large Language Models

no code implementations29 Sep 2023 Ansong Ni, Pengcheng Yin, Yilun Zhao, Martin Riddell, Troy Feng, Rui Shen, Stephen Yin, Ye Liu, Semih Yavuz, Caiming Xiong, Shafiq Joty, Yingbo Zhou, Dragomir Radev, Arman Cohan

Recently, large language models (LLMs), especially those that are pretrained on code, have demonstrated strong capabilities in generating programs from natural language inputs in a few-shot or even zero-shot manner.

Code Generation Math +1

ODSum: New Benchmarks for Open Domain Multi-Document Summarization

1 code implementation16 Sep 2023 Yijie Zhou, Kejian Shi, Wencai Zhang, Yixin Liu, Yilun Zhao, Arman Cohan

Open-domain Multi-Document Summarization (ODMDS) is a critical tool for condensing vast arrays of documents into coherent, concise summaries.

Document Summarization Multi-Document Summarization +1

Struc-Bench: Are Large Language Models Really Good at Generating Complex Structured Data?

1 code implementation16 Sep 2023 Xiangru Tang, Yiming Zong, Jason Phang, Yilun Zhao, Wangchunshu Zhou, Arman Cohan, Mark Gerstein

Despite the remarkable capabilities of Large Language Models (LLMs) like GPT-4, producing complex, structured tabular data remains challenging.

Hallucination

RobuT: A Systematic Study of Table QA Robustness Against Human-Annotated Adversarial Perturbations

1 code implementation25 Jun 2023 Yilun Zhao, Chen Zhao, Linyong Nan, Zhenting Qi, Wenlin Zhang, Xiangru Tang, Boyu Mi, Dragomir Radev

Despite significant progress having been made in question answering on tabular data (Table QA), it's unclear whether, and to what extent existing Table QA models are robust to task-specific perturbations, e. g., replacing key question entities or shuffling table columns.

Few-Shot Learning Question Answering

Investigating Table-to-Text Generation Capabilities of LLMs in Real-World Information Seeking Scenarios

2 code implementations24 May 2023 Yilun Zhao, Haowei Zhang, Shengyun Si, Linyong Nan, Xiangru Tang, Arman Cohan

These include the LogicNLG and our newly-constructed LoTNLG datasets for data insight generation, along with the FeTaQA and our newly-constructed F2WTQ datasets for query-based generation.

Table-to-Text Generation

QTSumm: Query-Focused Summarization over Tabular Data

2 code implementations23 May 2023 Yilun Zhao, Zhenting Qi, Linyong Nan, Boyu Mi, Yixin Liu, Weijin Zou, Simeng Han, Ruizhe Chen, Xiangru Tang, Yumo Xu, Dragomir Radev, Arman Cohan

Motivated by this, we define a new query-focused table summarization task, where text generation models have to perform human-like reasoning and analysis over the given table to generate a tailored summary.

Query-focused Summarization Table-to-Text Generation

Enhancing Few-shot Text-to-SQL Capabilities of Large Language Models: A Study on Prompt Design Strategies

no code implementations21 May 2023 Linyong Nan, Yilun Zhao, Weijin Zou, Narutatsu Ri, Jaesung Tae, Ellen Zhang, Arman Cohan, Dragomir Radev

In-context learning (ICL) has emerged as a new approach to various natural language processing tasks, utilizing large language models (LLMs) to make predictions based on context that has been supplemented with a few examples or task-specific instructions.

In-Context Learning Question Answering +1

Towards Interpretable and Efficient Automatic Reference-Based Summarization Evaluation

1 code implementation7 Mar 2023 Yixin Liu, Alexander R. Fabbri, Yilun Zhao, PengFei Liu, Shafiq Joty, Chien-Sheng Wu, Caiming Xiong, Dragomir Radev

Interpretability and efficiency are two important considerations for the adoption of neural automatic metrics.

ReasTAP: Injecting Table Reasoning Skills During Pre-training via Synthetic Reasoning Examples

1 code implementation22 Oct 2022 Yilun Zhao, Linyong Nan, Zhenting Qi, Rui Zhang, Dragomir Radev

Reasoning over tabular data requires both table structure understanding and a broad set of table reasoning skills.

Ranked #3 on Semantic Parsing on WikiSQL (Denotation accuracy (test) metric)

Fact Verification Question Answering +3

MultiHiertt: Numerical Reasoning over Multi Hierarchical Tabular and Textual Data

1 code implementation ACL 2022 Yilun Zhao, Yunxiang Li, Chenying Li, Rui Zhang

Numerical reasoning over hybrid data containing both textual and tabular content (e. g., financial reports) has recently attracted much attention in the NLP community.

Question Answering

Apparel-invariant Feature Learning for Apparel-changed Person Re-identification

no code implementations14 Aug 2020 Zhengxu Yu, Yilun Zhao, Bin Hong, Zhongming Jin, Jianqiang Huang, Deng Cai, Xiaofei He, Xian-Sheng Hua

Therefore, it is critical to learn an apparel-invariant person representation under cases like cloth changing or several persons wearing similar clothes.

Person Re-Identification Representation Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.