1 code implementation • 11 Jan 2025 • Yifan Zhang, Yifeng Liu, Huizhuo Yuan, Zhen Qin, Yang Yuan, Quanquan Gu, Andrew Chi-Chih Yao
Scaling language models to handle longer input sequences typically necessitates large key-value (KV) caches, resulting in substantial memory overhead during inference.
1 code implementation • 16 Sep 2024 • Yifan Zhang, Yang Yuan, Andrew Chi-Chih Yao
We introduce Diagram of Thought (DoT), a framework that models iterative reasoning in large language models (LLMs) as the construction of a directed acyclic graph (DAG) within a single model.
2 code implementations • 12 Feb 2024 • Yifan Zhang, Yifan Luo, Yang Yuan, Andrew Chi-Chih Yao
Our method showcases a 2 times increase in pretraining token efficiency compared to state-of-the-art baselines, underscoring the potential of our approach in enhancing models' mathematical reasoning capabilities.
1 code implementation • 17 Jan 2024 • Haoxiong Liu, Yifan Zhang, Yifan Luo, Andrew Chi-Chih Yao
Despite the advancements in large language models (LLMs) for mathematical reasoning, solving competition-level math problems remains a significant challenge, especially for open-source LLMs without external tools.
Ranked #66 on
Math Word Problem Solving
on MATH
(using extra training data)
1 code implementation • 20 Nov 2023 • Yifan Zhang, Yang Yuan, Andrew Chi-Chih Yao
In this work, we present a comprehensive study of Meta Prompting (MP), an innovative technique reshaping the utilization of language models (LMs) and AI systems in problem-solving and data interaction.
1 code implementation • 8 Aug 2023 • Yifan Zhang, Jingqin Yang, Yang Yuan, Andrew Chi-Chih Yao
We demonstrate CR's superiority through several complex reasoning tasks: it outperforms existing methods in logical inference tasks with up to a 9. 3% improvement, achieving 98. 04% accuracy on the curated FOLIO wiki dataset.
Ranked #14 on
Math Word Problem Solving
on MATH
2 code implementations • 21 Jun 2021 • Jing Xu, Sen Wang, LiWei Wang, Andrew Chi-Chih Yao
Federated Learning is a distributed machine learning approach which enables model training without data sharing.