Search Results for author: Yao Fu

Found 35 papers, 24 papers with code

Toward Inference-optimal Mixture-of-Expert Large Language Models

no code implementations3 Apr 2024 Longfei Yun, Yonghao Zhuang, Yao Fu, Eric P Xing, Hao Zhang

Like dense models, training MoEs requires answering the same question: given a training budget, what is the optimal allocation on the model size and number of tokens?

Data Engineering for Scaling Language Models to 128K Context

2 code implementations15 Feb 2024 Yao Fu, Rameswar Panda, Xinyao Niu, Xiang Yue, Hannaneh Hajishirzi, Yoon Kim, Hao Peng

We demonstrate that continual pretraining of the full model on 1B-5B tokens of such data is an effective and affordable strategy for scaling the context length of language models to 128K.

4k Continual Pretraining

OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models

1 code implementation29 Jan 2024 Fuzhao Xue, Zian Zheng, Yao Fu, Jinjie Ni, Zangwei Zheng, Wangchunshu Zhou, Yang You

To help the open-source community have a better understanding of Mixture-of-Experts (MoE) based large language models (LLMs), we train and release OpenMoE, a series of fully open-sourced and reproducible decoder-only MoE LLMs, ranging from 650M to 34B parameters and trained on up to over 1T tokens.

ServerlessLLM: Locality-Enhanced Serverless Inference for Large Language Models

no code implementations25 Jan 2024 Yao Fu, Leyang Xue, Yeqi Huang, Andrei-Octavian Brabete, Dmitrii Ustiugov, Yuvraj Patel, Luo Mai

This paper presents ServerlessLLM, a locality-enhanced serverless inference system for Large Language Models (LLMs).

MoE-Infinity: Activation-Aware Expert Offloading for Efficient MoE Serving

1 code implementation25 Jan 2024 Leyang Xue, Yao Fu, Zhan Lu, Luo Mai, Mahesh Marina

This paper presents MoE-Infinity, a cost-efficient mixture-of-expert (MoE) serving system that realizes activation-aware expert offloading.

Critical Data Size of Language Models from a Grokking Perspective

no code implementations19 Jan 2024 Xuekai Zhu, Yao Fu, BoWen Zhou, Zhouhan Lin

We formalize the phase transition under the grokking configuration into the Data Efficiency Hypothesis and identify data insufficiency, sufficiency, and surplus regimes in language models training dynamics.

Language Modelling Memorization

MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning

1 code implementation11 Sep 2023 Xiang Yue, Xingwei Qu, Ge Zhang, Yao Fu, Wenhao Huang, Huan Sun, Yu Su, Wenhu Chen

The MAmmoTH models are trained on MathInstruct, our meticulously curated instruction tuning dataset.

Math Mathematical Reasoning

Go Beyond Imagination: Maximizing Episodic Reachability with World Models

1 code implementation25 Aug 2023 Yao Fu, Run Peng, Honglak Lee

Efficient exploration is a challenging topic in reinforcement learning, especially for sparse reward tasks.

Efficient Exploration

Chain-of-Thought Hub: A Continuous Effort to Measure Large Language Models' Reasoning Performance

1 code implementation26 May 2023 Yao Fu, Litu Ou, Mingyu Chen, Yuhao Wan, Hao Peng, Tushar Khot

As large language models (LLMs) are continuously being developed, their evaluation becomes increasingly important yet challenging.

Improving Language Model Negotiation with Self-Play and In-Context Learning from AI Feedback

1 code implementation17 May 2023 Yao Fu, Hao Peng, Tushar Khot, Mirella Lapata

We study whether multiple large language models (LLMs) can autonomously improve each other in a negotiation game by playing, reflecting, and criticizing.

In-Context Learning Language Modelling +1

C-Eval: A Multi-Level Multi-Discipline Chinese Evaluation Suite for Foundation Models

1 code implementation NeurIPS 2023 Yuzhen Huang, Yuzhuo Bai, Zhihao Zhu, Junlei Zhang, Jinghan Zhang, Tangjun Su, Junteng Liu, Chuancheng Lv, Yikai Zhang, Jiayi Lei, Yao Fu, Maosong Sun, Junxian He

We present C-Eval, the first comprehensive Chinese evaluation suite designed to assess advanced knowledge and reasoning abilities of foundation models in a Chinese context.

Multiple-choice

Specializing Smaller Language Models towards Multi-Step Reasoning

2 code implementations30 Jan 2023 Yao Fu, Hao Peng, Litu Ou, Ashish Sabharwal, Tushar Khot

by paying the price of decreased generic ability, we can clearly lift up the scaling curve of models smaller than 10B towards a specialized multi-step math reasoning ability.

Math Model Selection

TorchOpt: An Efficient Library for Differentiable Optimization

1 code implementation13 Nov 2022 Jie Ren, Xidong Feng, Bo Liu, Xuehai Pan, Yao Fu, Luo Mai, Yaodong Yang

TorchOpt further provides a high-performance distributed execution runtime.

Just-DREAM-about-it: Figurative Language Understanding with DREAM-FLUTE

1 code implementation28 Oct 2022 Yuling Gu, Yao Fu, Valentina Pyatkin, Ian Magnusson, Bhavana Dalvi Mishra, Peter Clark

We hypothesize that to perform this task well, the reader needs to mentally elaborate the scene being described to identify a sensible meaning of the language.

Decomposed Prompting: A Modular Approach for Solving Complex Tasks

1 code implementation5 Oct 2022 Tushar Khot, Harsh Trivedi, Matthew Finlayson, Yao Fu, Kyle Richardson, Peter Clark, Ashish Sabharwal

On symbolic reasoning tasks, we can further decompose sub-tasks that are hard for LLMs into even simpler solvable sub-tasks.

Information Retrieval Retrieval

Complexity-Based Prompting for Multi-Step Reasoning

no code implementations3 Oct 2022 Yao Fu, Hao Peng, Ashish Sabharwal, Peter Clark, Tushar Khot

In this work, we propose complexity-based prompting, a simple and effective example selection scheme for multi-step reasoning.

Date Understanding GSM8K +2

Data-to-text Generation with Variational Sequential Planning

1 code implementation28 Feb 2022 Ratish Puduppully, Yao Fu, Mirella Lapata

We consider the task of data-to-text generation, which aims to create textual output from non-linguistic input.

Data-to-Text Generation

Scaling Structured Inference with Randomization

1 code implementation7 Dec 2021 Yao Fu, John P. Cunningham, Mirella Lapata

Here, we propose a family of randomized dynamic programming (RDP) algorithms for scaling structured models to tens of thousands of latent states.

Analyzing the Confidentiality of Undistillable Teachers in Knowledge Distillation

no code implementations NeurIPS 2021 Souvik Kundu, Qirui Sun, Yao Fu, Massoud Pedram, Peter Beerel

Knowledge distillation (KD) has recently been identified as a method that can unintentionally leak private information regarding the details of a teacher model to an unauthorized student.

Knowledge Distillation

Discovering Latent Network Topology in Contextualized Representations with Randomized Dynamic Programming

no code implementations29 Sep 2021 Yao Fu, Mirella Lapata

We use RDP to analyze the representation space of pretrained language models, discovering a large-scale latent network in a fully unsupervised way.

Paraphrase Generation

Optimizing the Numbers of Queries and Replies in Federated Learning with Differential Privacy

1 code implementation5 Jul 2021 Yipeng Zhou, Xuezheng Liu, Yao Fu, Di wu, Chao Li, Shui Yu

In this work, we study a crucial question which has been vastly overlooked by existing works: what are the optimal numbers of queries and replies in FL with DP so that the final model accuracy is maximized.

Federated Learning

Probing BERT in Hyperbolic Spaces

1 code implementation ICLR 2021 Boli Chen, Yao Fu, Guangwei Xu, Pengjun Xie, Chuanqi Tan, Mosha Chen, Liping Jing

We introduce a Poincare probe, a structural probe projecting these embeddings into a Poincare subspace with explicitly defined hierarchies.

Word Embeddings

On the Practicality of Differential Privacy in Federated Learning by Tuning Iteration Times

no code implementations11 Jan 2021 Yao Fu, Yipeng Zhou, Di wu, Shui Yu, Yonggang Wen, Chao Li

Then, we theoretically derive: 1) the conditions for the DP based FedAvg to converge as the number of global iterations (GI) approaches infinity; 2) the method to set the number of local iterations (LI) to minimize the negative influence of DP noises.

Federated Learning

Paraphrase Generation with Latent Bag of Words

2 code implementations NeurIPS 2019 Yao Fu, Yansong Feng, John P. Cunningham

Inspired by variational autoencoders with discrete latent structures, in this work, we propose a latent bag of words (BOW) model for paraphrase generation.

Paraphrase Generation Word Embeddings

Rethinking Text Attribute Transfer: A Lexical Analysis

1 code implementation WS 2019 Yao Fu, Hao Zhou, Jiaze Chen, Lei LI

We apply this framework to existing datasets and models and show that: (1) the pivot words are strong features for the classification of sentence attributes; (2) to change the attribute of a sentence, many datasets only requires to change certain pivot words; (3) consequently, many transfer models only perform the lexical-level modification, while leaving higher-level sentence structures unchanged.

Attribute General Classification +3

Natural Answer Generation with Heterogeneous Memory

no code implementations NAACL 2018 Yao Fu, Yansong Feng

Memory augmented encoder-decoder framework has achieved promising progress for natural language generation tasks.

Answer Generation Question Answering +2

Cannot find the paper you are looking for? You can Submit a new open access paper.