no code implementations • 12 Apr 2025 • Zixuan Ke, Fangkai Jiao, Yifei Ming, Xuan-Phi Nguyen, Austin Xu, Do Xuan Long, Minzhi Li, Chengwei Qin, Peifeng Wang, Silvio Savarese, Caiming Xiong, Shafiq Joty
In this survey, we categorize existing methods along two orthogonal dimensions: (1) Regimes, which define the stage at which reasoning is achieved (either at inference time or through dedicated training); and (2) Architectures, which determine the components involved in the reasoning process, distinguishing between standalone LLMs and agentic compound systems that incorporate external tools, and multi-agent collaborations.
no code implementations • 23 Dec 2024 • Hailin Chen, Fangkai Jiao, Mathieu Ravaut, Nawshad Farruque, Xuan Phi Nguyen, Chengwei Qin, Manan Dey, Bosheng Ding, Caiming Xiong, Shafiq Joty, Yingbo Zhou
The rapid development of large language models (LLMs) necessitates robust, unbiased, and scalable methods for evaluating their capabilities.
1 code implementation • 25 Nov 2024 • Fangkai Jiao, Geyang Guo, Xingxing Zhang, Nancy F. Chen, Shafiq Joty, Furu Wei
Specifically, using Mathstral-7B as our base model, we improve MATH results from 58. 3 to 68. 6, surpassing both NuminaMath-72B and GPT-4-Turbo-1106-preview.
no code implementations • 13 Nov 2024 • Yangyang Guo, Fangkai Jiao, Liqiang Nie, Mohan Kankanhalli
The problem causes these defense methods to exhibit unintended abstention, even in the presence of benign inputs, thereby undermining their reliability in faithfully defending against attacks.
no code implementations • 2 Oct 2024 • Xingxuan Li, Weiwen Xu, Ruochen Zhao, Fangkai Jiao, Shafiq Joty, Lidong Bing
We validate CR-Planner on challenging domain-knowledge-intensive and reasoning-heavy tasks, including competitive programming, theorem-driven math reasoning, and complex domain retrieval problems.
no code implementations • 22 Apr 2024 • Mengzhao Jia, Zhihan Zhang, Wenhao Yu, Fangkai Jiao, Meng Jiang
Open-source multimodal large language models (MLLMs) excel in various tasks involving textual and visual inputs but still struggle with complex multimodal mathematical reasoning, lagging behind proprietary models like GPT-4V(ision) and Gemini-Pro.
no code implementations • 19 Apr 2024 • Chengwei Qin, Wenhan Xia, Tan Wang, Fangkai Jiao, Yuchen Hu, Bosheng Ding, Ruirui Chen, Shafiq Joty
One key finding in psychology is that compared with irrelevant past experiences, recalling relevant ones can help humans better handle new tasks.
1 code implementation • 31 Mar 2024 • Mathieu Ravaut, Bosheng Ding, Fangkai Jiao, Hailin Chen, Xingxuan Li, Ruochen Zhao, Chengwei Qin, Caiming Xiong, Shafiq Joty
With the rise of Large Language Models (LLMs) in recent years, abundant new opportunities are emerging, but also new challenges, among which contamination is quickly becoming critical.
no code implementations • 1 Feb 2024 • Fangkai Jiao, Chengwei Qin, Zhengyuan Liu, Nancy F. Chen, Shafiq Joty
Large Language Models (LLMs) have demonstrated significant potential in handling complex reasoning tasks through step-by-step rationale generation.
no code implementations • 28 Dec 2023 • Chengwei Qin, Wenhan Xia, Fangkai Jiao, Chen Chen, Yuchen Hu, Bosheng Ding, Shafiq Joty
Large language models (LLMs) have shown impressive few-shot generalization on many tasks via in-context learning (ICL).
1 code implementation • 28 Nov 2023 • Hailin Chen, Fangkai Jiao, Xingxuan Li, Chengwei Qin, Mathieu Ravaut, Ruochen Zhao, Caiming Xiong, Shafiq Joty
Upon its release in late 2022, ChatGPT has brought a seismic shift in the entire landscape of AI, both in research and commerce.
1 code implementation • 17 Oct 2023 • Yangyang Guo, Fangkai Jiao, Zhiqi Shen, Liqiang Nie, Mohan Kankanhalli
Teaching Visual Question Answering (VQA) models to refrain from answering unanswerable questions is necessary for building a trustworthy AI system.
1 code implementation • 9 Sep 2023 • Bin Wang, Zhengyuan Liu, Xin Huang, Fangkai Jiao, Yang Ding, AiTi Aw, Nancy F. Chen
We present SeaEval, a benchmark for multilingual foundation models.
2 code implementations • 23 May 2023 • Fangkai Jiao, Zhiyang Teng, Bosheng Ding, Zhengyuan Liu, Nancy F. Chen, Shafiq Joty
Existing efforts to improve logical reasoning ability of language models have predominantly relied on supervised fine-tuning, hindering generalization to new domains and/or tasks.
1 code implementation • 4 May 2023 • Fangkai Jiao, Bosheng Ding, Tianze Luo, Zhanfeng Mo
This project focuses on enhancing open-source large language models through instruction-tuning and providing comprehensive evaluations of their performance.
no code implementations • 20 Mar 2023 • Ruochen Zhao, Hailin Chen, Weishi Wang, Fangkai Jiao, Xuan Long Do, Chengwei Qin, Bosheng Ding, Xiaobao Guo, Minzhi Li, Xingxuan Li, Shafiq Joty
As Large Language Models (LLMs) become popular, there emerged an important trend of using multimodality to augment the LLMs' generation ability, which enables LLMs to better interact with the world.
1 code implementation • Findings (ACL) 2022 • Fangkai Jiao, Yangyang Guo, Xuemeng Song, Liqiang Nie
Logical reasoning is of vital importance to natural language understanding.
Ranked #3 on
Reading Comprehension
on ReClor
1 code implementation • Findings (ACL) 2021 • Fangkai Jiao, Yangyang Guo, Yilin Niu, Feng Ji, Feng-Lin Li, Liqiang Nie
Pre-trained Language Models (PLMs) have achieved great success on Machine Reading Comprehension (MRC) over the past few years.
1 code implementation • ACL 2020 • Yilin Niu, Fangkai Jiao, Mantong Zhou, Ting Yao, Jingfang Xu, Minlie Huang
Neural models have achieved great success on machine reading comprehension (MRC), many of which typically consist of two components: an evidence extractor and an answer predictor.