no code implementations • EMNLP 2020 • Wenlin Yao, Zeyu Dai, Maitreyi Ramaswamy, Bonan Min, Ruihong Huang
We first obtain the initial set of event pairs that are likely to have the subevent relation, by exploiting two observations that 1) subevents are temporally contained by the parent event, and 2) the definitions of the parent event can be used to further guide the identification of subevents.
no code implementations • 21 Feb 2025 • Xuansheng Wu, Jiayi Yuan, Wenlin Yao, Xiaoming Zhai, Ninghao Liu
Large language models (LLMs) excel at handling human queries, but they can occasionally generate flawed or unexpected responses.
1 code implementation • 25 Oct 2024 • Hongliang He, Wenlin Yao, Kaixin Ma, Wenhao Yu, Hongming Zhang, Tianqing Fang, Zhenzhong Lan, Dong Yu
In this paper, we introduce an open-source framework designed to facilitate the development of multimodal web agent that can autonomously conduct real-world exploration and improve itself.
1 code implementation • 7 Oct 2024 • Daoan Zhang, Guangchen Lan, Dong-Jun Han, Wenlin Yao, Xiaoman Pan, Hongming Zhang, Mingxiao Li, Pengcheng Chen, Yu Dong, Christopher Brinton, Jiebo Luo
To address the limitations of both on- and off-policy RLHF, we propose a preference optimization method that aligns DMs with preferences without relying on reward models or paired human-annotated data.
no code implementations • 4 Oct 2024 • Murong Yue, Wenlin Yao, Haitao Mi, Dian Yu, Ziyu Yao, Dong Yu
In this paper, we propose DOTS, an approach enabling LLMs to reason dynamically via optimal reasoning trajectory search, tailored to the specific characteristics of each question and the inherent capability of the task-solving LLM.
no code implementations • 2 Oct 2024 • Yebowen Hu, Xiaoyang Wang, Wenlin Yao, Yiming Lu, Daoan Zhang, Hassan Foroosh, Dong Yu, Fei Liu
In this paper, we introduce DeFine, a new framework that constructs probabilistic factor profiles from complex scenarios.
1 code implementation • 27 Sep 2024 • Fan Lin, Shuyi Xie, Yong Dai, Wenlin Yao, Tianjiao Lang, Zishan Xu, Zhichao Hu, Xiao Xiao, Yuhong Liu, Yu Zhang
To produce high-quality data, we incorporate a self-correct mechanism into our generalization framework, and develop two models to predict prompt discrimination and difficulty score to facilitate our data synthesis framework, contributing valuable tools to evaluation data synthesis research.
1 code implementation • 25 Sep 2024 • Wenlin Yao, Haitao Mi, Dong Yu
Despite recent advancements in large language models (LLMs), their performance on complex reasoning problems requiring multi-step thinking and combining various skills is still limited.
1 code implementation • 12 Sep 2024 • Liqiang Jing, Zhehui Huang, Xiaoyang Wang, Wenlin Yao, Wenhao Yu, Kaixin Ma, Hongming Zhang, Xinya Du, Dong Yu
To bridge this gap, we introduce DSBench, a comprehensive benchmark designed to evaluate data science agents with realistic tasks.
1 code implementation • 17 Jun 2024 • Yebowen Hu, Kaiqiang Song, Sangwoo Cho, Xiaoyang Wang, Wenlin Yao, Hassan Foroosh, Dong Yu, Fei Liu
Finally, the effectiveness of reasoning is influenced by narrative complexity, information density, and domain-specific terms, highlighting the challenges in analytical reasoning tasks.
1 code implementation • 29 May 2024 • Zhenwen Liang, Dian Yu, Wenhao Yu, Wenlin Yao, Zhihan Zhang, Xiangliang Zhang, Dong Yu
We evaluate the performance of various SOTA LLMs on the MathChat benchmark, and we observe that while these models excel in single turn question answering, they significantly underperform in more complex scenarios that require sustained reasoning and dialogue understanding.
1 code implementation • 13 Mar 2024 • Xuansheng Wu, Haiyan Zhao, Yaochen Zhu, Yucheng Shi, Fan Yang, Tianming Liu, Xiaoming Zhai, Wenlin Yao, Jundong Li, Mengnan Du, Ninghao Liu
Therefore, in this paper, we introduce Usable XAI in the context of LLMs by analyzing (1) how XAI can benefit LLMs and AI systems, and (2) how LLMs can contribute to the advancement of XAI.
1 code implementation • 27 Feb 2024 • Xinran Zhao, Hongming Zhang, Xiaoman Pan, Wenlin Yao, Dong Yu, Tongshuang Wu, Jianshu Chen
For a LLM to be trustworthy, its confidence level should be well-calibrated with its actual performance.
2 code implementations • 25 Jan 2024 • Hongliang He, Wenlin Yao, Kaixin Ma, Wenhao Yu, Yong Dai, Hongming Zhang, Zhenzhong Lan, Dong Yu
The rapid advancement of large language models (LLMs) has led to a new era marked by the development of autonomous applications in real-world scenarios, which drives innovation in creating advanced web agents.
1 code implementation • 7 Jan 2024 • Yiwei Qin, Kaiqiang Song, Yebowen Hu, Wenlin Yao, Sangwoo Cho, Xiaoyang Wang, Xuansheng Wu, Fei Liu, PengFei Liu, Dong Yu
This paper introduces the Decomposed Requirements Following Ratio (DRFR), a new metric for evaluating Large Language Models' (LLMs) ability to follow instructions.
6 code implementations • 15 Nov 2023 • Fuxiao Liu, Xiaoyang Wang, Wenlin Yao, Jianshu Chen, Kaiqiang Song, Sangwoo Cho, Yaser Yacoob, Dong Yu
Recognizing the need for a comprehensive evaluation of LMM chart understanding, we also propose a MultiModal Chart Benchmark (\textbf{MMC-Benchmark}), a comprehensive human-annotated benchmark with nine distinct tasks evaluating reasoning capabilities over charts.
1 code implementation • 9 Nov 2023 • Shuyi Xie, Wenlin Yao, Yong Dai, Shaobo Wang, Donlin Zhou, Lifeng Jin, Xinhua Feng, Pengzhi Wei, Yujie Lin, Zhichao Hu, Dong Yu, Zhengyou Zhang, Jing Nie, Yuhong Liu
We construct a hierarchical task tree encompassing 7 major areas covering over 200 categories and over 800 tasks, which covers diverse capabilities such as question answering, reasoning, multiturn dialogue, and text generation, to evaluate LLMs in a comprehensive and in-depth manner.
1 code implementation • 30 Sep 2023 • Xuansheng Wu, Wenlin Yao, Jianshu Chen, Xiaoman Pan, Xiaoyang Wang, Ninghao Liu, Dong Yu
In this work, we investigate how the instruction tuning adjusts pre-trained models with a focus on intrinsic changes.
no code implementations • 16 Jul 2023 • Zhenwen Liang, Dian Yu, Xiaoman Pan, Wenlin Yao, Qingkai Zeng, Xiangliang Zhang, Dong Yu
Our approach uniquely considers the various annotation formats as different "views" and leverages them in training the model.
no code implementations • 8 Jul 2023 • Neeraj Varshney, Wenlin Yao, Hongming Zhang, Jianshu Chen, Dong Yu
Specifically, the detection technique achieves a recall of ~88% and the mitigation technique successfully mitigates 57. 6% of the correctly detected hallucinations.
1 code implementation • 29 Jun 2023 • Xuansheng Wu, Huachi Zhou, Yucheng Shi, Wenlin Yao, Xiao Huang, Ninghao Liu
To evaluate our approach, we introduce a cold-start recommendation benchmark, and the results demonstrate that the enhanced small language models can achieve comparable cold-start recommendation performance to that of large models with only $17\%$ of the inference time.
1 code implementation • 24 May 2023 • James Y. Huang, Wenlin Yao, Kaiqiang Song, Hongming Zhang, Muhao Chen, Dong Yu
It is unclear whether the compositional semantics of sentences can be directly reflected as compositional operations in the embedding space.
1 code implementation • 6 Dec 2022 • Pei Chen, Wenlin Yao, Hongming Zhang, Xiaoman Pan, Dian Yu, Dong Yu, Jianshu Chen
However, there has been limited research on the zero-shot KBC settings, where we need to deal with unseen entities and relations that emerge in a constantly growing knowledge base.
1 code implementation • 2 Dec 2022 • Chao Zhao, Faeze Brahman, Kaiqiang Song, Wenlin Yao, Dian Yu, Snigdha Chaturvedi
To encourage research in this direction, we propose NarraSum, a large-scale narrative summarization dataset.
1 code implementation • 9 Nov 2022 • Hongming Zhang, Wenlin Yao, Dong Yu
We argue that using the static embedding of the event type name might not be enough because a single word could be ambiguous, and we need a sentence to define the type semantics accurately.
no code implementations • 28 Oct 2022 • Xiaoman Pan, Wenlin Yao, Hongming Zhang, Dian Yu, Dong Yu, Jianshu Chen
In this paper, we develop a novel semi-parametric language model architecture, Knowledge-in-Context (KiC), which empowers a parametric text-to-text language model with a knowledge-rich external memory.
Ranked #5 on
Question Answering
on StoryCloze
1 code implementation • 22 Oct 2022 • Fei Wang, Kaiqiang Song, Hongming Zhang, Lifeng Jin, Sangwoo Cho, Wenlin Yao, Xiaoyang Wang, Muhao Chen, Dong Yu
Recent literature adds extractive summaries as guidance for abstractive summarization models to provide hints of salient content and achieves better performance.
Ranked #7 on
Abstractive Text Summarization
on CNN / Daily Mail
1 code implementation • 21 Oct 2022 • Yue Yang, Wenlin Yao, Hongming Zhang, Xiaoyang Wang, Dong Yu, Jianshu Chen
Large-scale pretrained language models have made significant advances in solving downstream language understanding tasks.
Ranked #2 on
Visual Commonsense Tests
on ViComTe-color
1 code implementation • ACL 2022 • Chao Zhao, Wenlin Yao, Dian Yu, Kaiqiang Song, Dong Yu, Jianshu Chen
Comprehending a dialogue requires a model to capture diverse kinds of key information in the utterances, which are either scattered around or implicitly implied in different turns of conversations.
1 code implementation • ACL 2022 • Xiang Yue, Xiaoman Pan, Wenlin Yao, Dian Yu, Dong Yu, Jianshu Chen
And with our pretrained reader, the entire system improves by up to 4% in exact match.
1 code implementation • EMNLP 2021 • Wenlin Yao, Xiaoman Pan, Lifeng Jin, Jianshu Chen, Dian Yu, Dong Yu
We then train a model to identify semantic equivalence between a target word in context and one of its glosses using these aligned inventories, which exhibits strong transfer capability to many WSD tasks.
1 code implementation • 4 Oct 2020 • Wenlin Yao, Cheng Zhang, Shiva Saravanan, Ruihong Huang, Ali Mostafavi
People increasingly use social media to report emergencies, seek help or share information during disasters, which makes social networks an important tool for disaster management.
no code implementations • ACL 2018 • Wenlin Yao, Ruihong Huang
Inspired by the double temporality characteristic of narrative texts, we propose a novel approach for acquiring rich temporal "before/after" event knowledge across sentences in narrative stories.
no code implementations • IJCNLP 2017 • Zeyu Dai, Wenlin Yao, Ruihong Huang
Focusing on the task of identifying event temporal status, we find that events directly or indirectly governing the target event in a dependency tree are most important contexts.
no code implementations • RANLP 2017 • Wenlin Yao, Zeyu Dai, Ruihong Huang, James Caverlee
The lack of large realistic datasets presents a bottleneck in online deception detection studies.
no code implementations • RANLP 2017 • Wenlin Yao, Saipravallika Nettyam, Ruihong Huang
Capabilities of detecting temporal relations between two events can benefit many applications.