Search Results for author: Runji Lin

Found 12 papers, 7 papers with code

LLM Critics Help Catch Bugs in Mathematics: Towards a Better Mathematical Verifier with Natural Language Feedback

1 code implementation20 Jun 2024 Bofei Gao, Zefan Cai, Runxin Xu, Peiyi Wang, Ce Zheng, Runji Lin, Keming Lu, Dayiheng Liu, Chang Zhou, Wen Xiao, Junjie Hu, Tianyu Liu, Baobao Chang

To mitigate the aforementioned insufficiency of binary labels, we introduce step-wise natural language feedbacks as rationale labels (i. e., the correctness of the current step and the explanations).

Binary Classification GSM8K +2

Online Merging Optimizers for Boosting Rewards and Mitigating Tax in Alignment

1 code implementation28 May 2024 Keming Lu, Bowen Yu, Fei Huang, Yang Fan, Runji Lin, Chang Zhou

Effectively aligning Large Language Models (LLMs) with human-centric values while preventing the degradation of abilities acquired through Pre-training and Supervised Fine-tuning (SFT) poses a central challenge in Reinforcement Learning from Human Feedback (RLHF).

Large Language Models Play StarCraft II: Benchmarks and A Chain of Summarization Approach

1 code implementation19 Dec 2023 Weiyu Ma, Qirui Mi, Yongcheng Zeng, Xue Yan, Yuqiao Wu, Runji Lin, Haifeng Zhang, Jun Wang

StarCraft II is a challenging benchmark for AI agents due to the necessity of both precise micro level operations and strategic macro awareness.

Language Modelling Large Language Model +2

Routing to the Expert: Efficient Reward-guided Ensemble of Large Language Models

no code implementations15 Nov 2023 Keming Lu, Hongyi Yuan, Runji Lin, Junyang Lin, Zheng Yuan, Chang Zhou, Jingren Zhou

Zooter shows computation efficiency in inference as it introduces only a minor computation overhead of a routing function compared with reward model ranking methods.

TAG

#InsTag: Instruction Tagging for Analyzing Supervised Fine-tuning of Large Language Models

1 code implementation14 Aug 2023 Keming Lu, Hongyi Yuan, Zheng Yuan, Runji Lin, Junyang Lin, Chuanqi Tan, Chang Zhou, Jingren Zhou

Based on this observation, we propose a data selector based on InsTag to select 6K diverse and complex samples from open-source datasets and fine-tune models on InsTag-selected data.

Diversity Instruction Following +1

Large Sequence Models for Sequential Decision-Making: A Survey

no code implementations24 Jun 2023 Muning Wen, Runji Lin, Hanjing Wang, Yaodong Yang, Ying Wen, Luo Mai, Jun Wang, Haifeng Zhang, Weinan Zhang

Transformer architectures have facilitated the development of large-scale and general-purpose sequence models for prediction tasks in natural language processing and computer vision, e. g., GPT-3 and Swin Transformer.

Decision Making

Contextual Transformer for Offline Meta Reinforcement Learning

no code implementations15 Nov 2022 Runji Lin, Ye Li, Xidong Feng, Zhaowei Zhang, Xian Hong Wu Fung, Haifeng Zhang, Jun Wang, Yali Du, Yaodong Yang

Firstly, we propose prompt tuning for offline RL, where a context vector sequence is concatenated with the input to guide the conditional policy generation.

D4RL Meta Reinforcement Learning +4

Scalable Model-based Policy Optimization for Decentralized Networked Systems

no code implementations13 Jul 2022 Yali Du, Chengdong Ma, Yuchen Liu, Runji Lin, Hao Dong, Jun Wang, Yaodong Yang

Reinforcement learning algorithms require a large amount of samples; this often limits their real-world applications on even simple tasks.

Multi-Agent Reinforcement Learning is a Sequence Modeling Problem

1 code implementation30 May 2022 Muning Wen, Jakub Grudzien Kuba, Runji Lin, Weinan Zhang, Ying Wen, Jun Wang, Yaodong Yang

In this paper, we introduce a novel architecture named Multi-Agent Transformer (MAT) that effectively casts cooperative multi-agent reinforcement learning (MARL) into SM problems wherein the task is to map agents' observation sequence to agents' optimal action sequence.

Decision Making Multi-agent Reinforcement Learning +2

Cannot find the paper you are looking for? You can Submit a new open access paper.