Search Results for author: Qucheng Gong

Found 11 papers, 7 papers with code

Kimi k1.5: Scaling Reinforcement Learning with LLMs

2 code implementations22 Jan 2025 Kimi Team, Angang Du, Bofei Gao, Bowei Xing, Changjiu Jiang, Cheng Chen, Cheng Li, Chenjun Xiao, Chenzhuang Du, Chonghua Liao, Chuning Tang, Congcong Wang, Dehao Zhang, Enming Yuan, Enzhe Lu, Fengxiang Tang, Flood Sung, Guangda Wei, Guokun Lai, Haiqing Guo, Han Zhu, Hao Ding, Hao Hu, Hao Yang, Hao Zhang, Haotian Yao, Haotian Zhao, Haoyu Lu, Haoze Li, Haozhen Yu, Hongcheng Gao, Huabin Zheng, Huan Yuan, Jia Chen, Jianhang Guo, Jianlin Su, Jianzhou Wang, Jie Zhao, Jin Zhang, Jingyuan Liu, Junjie Yan, Junyan Wu, Lidong Shi, Ling Ye, Longhui Yu, Mengnan Dong, Neo Zhang, Ningchen Ma, Qiwei Pan, Qucheng Gong, Shaowei Liu, Shengling Ma, Shupeng Wei, Sihan Cao, Siying Huang, Tao Jiang, Weihao Gao, Weimin Xiong, Weiran He, Weixiao Huang, Wenhao Wu, Wenyang He, Xianghui Wei, Xianqing Jia, Xingzhe Wu, Xinran Xu, Xinxing Zu, Xinyu Zhou, Xuehai Pan, Y. Charles, Yang Li, Yangyang Hu, Yangyang Liu, Yanru Chen, Yejie Wang, Yibo Liu, Yidao Qin, Yifeng Liu, Ying Yang, Yiping Bao, Yulun Du, Yuxin Wu, Yuzhi Wang, Zaida Zhou, Zhaoji Wang, Zhaowei Li, Zhen Zhu, Zheng Zhang, Zhexu Wang, Zhilin Yang, Zhiqi Huang, Zihao Huang, Ziyao Xu, Zonghan Yang

Moreover, we present effective long2short methods that use long-CoT techniques to improve short-CoT models, yielding state-of-the-art short-CoT reasoning results -- e. g., 60. 8 on AIME, 94. 6 on MATH500, 47. 3 on LiveCodeBench -- outperforming existing short-CoT models such as GPT-4o and Claude Sonnet 3. 5 by a large margin (up to +550%).

Math reinforcement-learning +2

Joint Policy Search for Multi-agent Collaboration with Imperfect Information

1 code implementation NeurIPS 2020 Yuandong Tian, Qucheng Gong, Tina Jiang

Based on this, we propose Joint Policy Search(JPS) that iteratively improves joint policies of collaborative agents in imperfect information games, without re-evaluating the entire game.

Combining Deep Reinforcement Learning and Search for Imperfect-Information Games

1 code implementation NeurIPS 2020 Noam Brown, Anton Bakhtin, Adam Lerer, Qucheng Gong

This paper presents ReBeL, a general framework for self-play reinforcement learning and search that provably converges to a Nash equilibrium in any two-player zero-sum game.

Deep Reinforcement Learning reinforcement-learning +1

All Simulations Are Not Equal: Simulation Reweighing for Imperfect Information Games

no code implementations25 Sep 2019 Qucheng Gong, Yuandong Tian

We use simulation reweighing in the playing phase of the game contract bridge, and show that it outperforms previous state-of-the-art Monte Carlo simulation based methods, and achieves better play per decision.

All

Simple is Better: Training an End-to-end Contract Bridge Bidding Agent without Human Knowledge

no code implementations25 Sep 2019 Qucheng Gong, Yu Jiang, Yuandong Tian

While playing is relatively easy for modern software, bidding is challenging and requires agents to learn a communication protocol to reach the optimal contract jointly, with their own private information.

Hierarchical Decision Making by Generating and Following Natural Language Instructions

1 code implementation NeurIPS 2019 Hengyuan Hu, Denis Yarats, Qucheng Gong, Yuandong Tian, Mike Lewis

We explore using latent natural language instructions as an expressive and compositional representation of complex actions for hierarchical decision making.

Decision Making

Luck Matters: Understanding Training Dynamics of Deep ReLU Networks

1 code implementation31 May 2019 Yuandong Tian, Tina Jiang, Qucheng Gong, Ari Morcos

We analyze the dynamics of training deep ReLU networks and their implications on generalization capability.

ELF OpenGo: An Analysis and Open Reimplementation of AlphaZero

1 code implementation12 Feb 2019 Yuandong Tian, Jerry Ma, Qucheng Gong, Shubho Sengupta, Zhuoyuan Chen, James Pinkerton, C. Lawrence Zitnick

The AlphaGo, AlphaGo Zero, and AlphaZero series of algorithms are remarkable demonstrations of deep reinforcement learning's capabilities, achieving superhuman performance in the complex game of Go with progressively increasing autonomy.

Game of Go

Latent forward model for Real-time Strategy game planning with incomplete information

no code implementations ICLR 2018 Yuandong Tian, Qucheng Gong

Model-free deep reinforcement learning approaches have shown superhuman performance in simulated environments (e. g., Atari games, Go, etc).

Atari Games Decision Making +3

ELF: An Extensive, Lightweight and Flexible Research Platform for Real-time Strategy Games

2 code implementations NeurIPS 2017 Yuandong Tian, Qucheng Gong, Wenling Shang, Yuxin Wu, C. Lawrence Zitnick

In addition, our platform is flexible in terms of environment-agent communication topologies, choices of RL methods, changes in game parameters, and can host existing C/C++-based game environments like Arcade Learning Environment.

Atari Games reinforcement-learning +3

Cannot find the paper you are looking for? You can Submit a new open access paper.