Search Results for author: Stephen Mcaleer

Found 36 papers, 15 papers with code

Llemma: An Open Language Model For Mathematics

4 code implementations • 16 Oct 2023 • Zhangir Azerbayev, Hailey Schoelkopf, Keiran Paster, Marco Dos Santos, Stephen Mcaleer, Albert Q. Jiang, Jia Deng, Stella Biderman, Sean Welleck

We present Llemma, a large language model for mathematics.

Ranked #5 on Automated Theorem Proving on miniF2F-test

Arithmetic Reasoning Automated Theorem Proving +3

6,556

Paper
Code

Mastering the Game of Stratego with Model-Free Multiagent Reinforcement Learning

1 code implementation • 30 Jun 2022 • Julien Perolat, Bart De Vylder, Daniel Hennes, Eugene Tarassov, Florian Strub, Vincent de Boer, Paul Muller, Jerome T. Connor, Neil Burch, Thomas Anthony, Stephen Mcaleer, Romuald Elie, Sarah H. Cen, Zhe Wang, Audrunas Gruslys, Aleksandra Malysheva, Mina Khan, Sherjil Ozair, Finbarr Timbers, Toby Pohlen, Tom Eccles, Mark Rowland, Marc Lanctot, Jean-Baptiste Lespiau, Bilal Piot, Shayegan Omidshafiei, Edward Lockhart, Laurent SIfre, Nathalie Beauguerlange, Remi Munos, David Silver, Satinder Singh, Demis Hassabis, Karl Tuyls

It has the additional complexity of requiring decision-making under imperfect information, similar to Texas hold'em poker, which has a significantly smaller game tree (on the order of $10^{164}$ nodes).

Board Games Decision Making +2

3,989

Paper
Code

Language Models can Solve Computer Tasks

1 code implementation • NeurIPS 2023 • Geunwoo Kim, Pierre Baldi, Stephen Mcaleer

We compare multiple LLMs and find that RCI with the InstructGPT-3+RLHF LLM is state-of-the-art on MiniWoB++, using only a handful of demonstrations per task rather than tens of thousands, and without a task-specific reward function.

Language Modelling Large Language Model +1

204

Paper
Code

Pipeline PSRO: A Scalable Approach for Finding Approximate Nash Equilibria in Large Games

2 code implementations • NeurIPS 2020 • Stephen McAleer, John Lanier, Roy Fox, Pierre Baldi

We also introduce an open-source environment for Barrage Stratego, a variant of Stratego with an approximate game tree complexity of $10^{50}$.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

XDO: A Double Oracle Algorithm for Extensive-Form Games

1 code implementation • NeurIPS 2021 • Stephen Mcaleer, John Lanier, Kevin Wang, Pierre Baldi, Roy Fox

NXDO is the first deep RL method that can find an approximate Nash equilibrium in high-dimensional continuous-action sequential games.

Reinforcement Learning (RL)

Paper
Code

Neural Auto-Curricula

1 code implementation • 4 Jun 2021 • Xidong Feng, Oliver Slumbers, Ziyu Wan, Bo Liu, Stephen Mcaleer, Ying Wen, Jun Wang, Yaodong Yang

When solving two-player zero-sum games, multi-agent reinforcement learning (MARL) algorithms often create populations of agents where, at each iteration, a new agent is discovered as the best response to a mixture over the opponent population.

Multi-agent Reinforcement Learning

Paper
Code

Neural Auto-Curricula in Two-Player Zero-Sum Games

1 code implementation • NeurIPS 2021 • Xidong Feng, Oliver Slumbers, Ziyu Wan, Bo Liu, Stephen Mcaleer, Ying Wen, Jun Wang, Yaodong Yang

Multi-agent Reinforcement Learning Vocal Bursts Valence Prediction

Paper
Code

Curiosity-Driven Multi-Criteria Hindsight Experience Replay

1 code implementation • 9 Jun 2019 • John B. Lanier, Stephen Mcaleer, Pierre Baldi

Dealing with sparse rewards is a longstanding challenge in reinforcement learning.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

ESCHER: Eschewing Importance Sampling in Games by Computing a History Value Function to Estimate Regret

1 code implementation • 8 Jun 2022 • Stephen Mcaleer, Gabriele Farina, Marc Lanctot, Tuomas Sandholm

DREAM, the only current CFR-based neural method that is model free and therefore scalable to very large games, trains a neural network on an estimated regret target that can have extremely high variance due to an importance sampling term inherited from Monte Carlo CFR (MCCFR).

counterfactual

Paper
Code

ASP: Learn a Universal Neural Solver!

1 code implementation • 1 Mar 2023 • Chenguang Wang, Zhouliang Yu, Stephen Mcaleer, Tianshu Yu, Yaodong Yang

Applying machine learning to combinatorial optimization problems has the potential to improve both efficiency and accuracy.

Combinatorial Optimization Traveling Salesman Problem

Paper
Code

Online Double Oracle

1 code implementation • 13 Mar 2021 • Le Cong Dinh, Yaodong Yang, Stephen Mcaleer, Zheng Tian, Nicolas Perez Nieves, Oliver Slumbers, David Henry Mguni, Haitham Bou Ammar, Jun Wang

Solving strategic games with huge action space is a critical yet under-explored topic in economics, operations research and artificial intelligence.

Paper
Code

AgentKit: Flow Engineering with Graphs, not Coding

1 code implementation • 17 Apr 2024 • Yue Wu, Yewen Fan, So Yeon Min, Shrimai Prabhumoye, Stephen Mcaleer, Yonatan Bisk, Ruslan Salakhutdinov, Yuanzhi Li, Tom Mitchell

The chains of nodes can be designed to explicitly enforce a naturally structured "thought process".

Paper
Code

Reducing Variance in Temporal-Difference Value Estimation via Ensemble of Deep Networks

1 code implementation • 16 Sep 2022 • Litian Liang, Yaosheng Xu, Stephen Mcaleer, Dailin Hu, Alexander Ihler, Pieter Abbeel, Roy Fox

On a set of 26 benchmark Atari environments, MeanQ outperforms all tested baselines, including the best available baseline, SUNRISE, at 100K interaction steps in 16/26 environments, and by 68% on average.

Paper
Code

Solving the Rubik's Cube Without Human Knowledge

9 code implementations • 18 May 2018 • Stephen McAleer, Forest Agostinelli, Alexander Shmakov, Pierre Baldi

A generally intelligent agent must be able to teach itself how to solve problems in complex domains with minimal human supervision.

Combinatorial Optimization reinforcement-learning +2

Paper
Code

Confronting Reward Model Overoptimization with Constrained RLHF

1 code implementation • 6 Oct 2023 • Ted Moskovitz, Aaditya K. Singh, DJ Strouse, Tuomas Sandholm, Ruslan Salakhutdinov, Anca D. Dragan, Stephen Mcaleer

Large language models are typically aligned with human preferences by optimizing $\textit{reward models}$ (RMs) fitted to human feedback.

Paper
Code

Solving the Rubik's Cube with Approximate Policy Iteration

no code implementations • ICLR 2019 • Stephen McAleer, Forest Agostinelli, Alexander Shmakov, Pierre Baldi

Autodidactic Iteration is able to learn how to solve the Rubik’s Cube and the 15-puzzle without relying on human data.

Rubik's Cube

Paper
Add Code

Evolutionary Reinforcement Learning for Sample-Efficient Multiagent Coordination

no code implementations • 18 Jun 2019 • Shauharda Khadka, Somdeb Majumdar, Santiago Miret, Stephen Mcaleer, Kagan Tumer

Training policies solely on the team-based reward is often difficult due to its sparsity.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

ColosseumRL: A Framework for Multiagent Reinforcement Learning in $N$-Player Games

no code implementations • 10 Dec 2019 • Alexander Shmakov, John Lanier, Stephen Mcaleer, Rohan Achar, Cristina Lopes, Pierre Baldi

Much of recent success in multiagent reinforcement learning has been in two-player zero-sum games.

Multiagent Systems

Paper
Add Code

Optimizing Multiagent Cooperation via Policy Evolution and Shared Experiences

no code implementations • ICML 2020 • Somdeb Majumdar, Shauharda Khadka, Santiago Miret, Stephen Mcaleer, Kagan Tumer

Training policies solely on the team-based reward is often difficult due to its sparsity.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Deep machine learning-assisted multiphoton microscopy to reduce light exposure and expedite imaging

no code implementations • 10 Nov 2020 • Stephen Mcaleer, Alex Fast, Yuntian Xue, Magdalene Seiler, William Tang, Mihaela Balu, Pierre Baldi, Andrew W. Browne

The skin dataset includes 550 images for each of the resolution levels.

BIG-bench Machine Learning SSIM +1

Paper
Add Code

A* Search Without Expansions: Learning Heuristic Functions with Deep Q-Networks

no code implementations • 8 Feb 2021 • Forest Agostinelli, Alexander Shmakov, Stephen Mcaleer, Roy Fox, Pierre Baldi

We use Q* search to solve the Rubik's cube when formulated with a large action space that includes 1872 meta-actions and find that this 157-fold increase in the size of the action space incurs less than a 4-fold increase in computation time and less than a 3-fold increase in number of nodes generated when performing Q* search.

Rubik's Cube

Paper
Add Code

Improving Social Welfare While Preserving Autonomy via a Pareto Mediator

no code implementations • 7 Jun 2021 • Stephen Mcaleer, John Lanier, Michael Dennis, Pierre Baldi, Roy Fox

Machine learning algorithms often make decisions on behalf of agents with varied and sometimes conflicting interests.

Open-Ended Question Answering

Paper
Add Code

Independent Natural Policy Gradient Always Converges in Markov Potential Games

no code implementations • 20 Oct 2021 • Roy Fox, Stephen Mcaleer, Will Overman, Ioannis Panageas

Recent results have shown that independent policy gradient converges in MPGs but it was not known whether Independent Natural Policy Gradient converges in MPGs as well.

Multi-agent Reinforcement Learning

Paper
Add Code

Temporal-Difference Value Estimation via Uncertainty-Guided Soft Updates

no code implementations • 28 Oct 2021 • Litian Liang, Yaosheng Xu, Stephen Mcaleer, Dailin Hu, Alexander Ihler, Pieter Abbeel, Roy Fox

Under the belief that $\beta$ is closely related to the (state dependent) model uncertainty, Entropy Regularized Q-Learning (EQL) further introduces a principled scheduling of $\beta$ by maintaining a collection of the model parameters that characterizes model uncertainty.

Q-Learning Scheduling

Paper
Add Code

Target Entropy Annealing for Discrete Soft Actor-Critic

no code implementations • 6 Dec 2021 • Yaosheng Xu, Dailin Hu, Litian Liang, Stephen Mcaleer, Pieter Abbeel, Roy Fox

Soft Actor-Critic (SAC) is considered the state-of-the-art algorithm in continuous action space settings.

Atari Games Scheduling

Paper
Add Code

Anytime PSRO for Two-Player Zero-Sum Games

no code implementations • 19 Jan 2022 • Stephen Mcaleer, Kevin Wang, John Lanier, Marc Lanctot, Pierre Baldi, Tuomas Sandholm, Roy Fox

PSRO is based on the tabular double oracle (DO) method, an algorithm that is guaranteed to converge to a Nash equilibrium, but may increase exploitability from one iteration to the next.

Multi-agent Reinforcement Learning reinforcement-learning +2

Paper
Add Code

Self-Play PSRO: Toward Optimal Populations in Two-Player Zero-Sum Games

no code implementations • 13 Jul 2022 • Stephen Mcaleer, JB Lanier, Kevin Wang, Pierre Baldi, Roy Fox, Tuomas Sandholm

Instead of adding only deterministic best responses to the opponent's least exploitable population mixture, SP-PSRO also learns an approximately optimal stochastic policy and adds it to the population as well.

Reinforcement Learning (RL)

Paper
Add Code

Feasible Adversarial Robust Reinforcement Learning for Underspecified Environments

no code implementations • 19 Jul 2022 • JB Lanier, Stephen Mcaleer, Pierre Baldi, Roy Fox

In this paper, we propose Feasible Adversarial Robust RL (FARR), a novel problem formulation and objective for automatically determining the set of environment parameter values over which to be robust.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Illusory Attacks: Detectability Matters in Adversarial Attacks on Sequential Decision-Makers

no code implementations • 20 Jul 2022 • Tim Franzmeyer, Stephen Mcaleer, João F. Henriques, Jakob N. Foerster, Philip H. S. Torr, Adel Bibi, Christian Schroeder de Witt

Autonomous agents deployed in the real world need to be robust against adversarial attacks on sensory inputs.

Adversarial Attack Adversarial Robustness

Paper
Add Code

Game Theoretic Rating in N-player general-sum games with Equilibria

no code implementations • 5 Oct 2022 • Luke Marris, Marc Lanctot, Ian Gemp, Shayegan Omidshafiei, Stephen Mcaleer, Jerome Connor, Karl Tuyls, Thore Graepel

Rating strategies in a game is an important area of research in game theory and artificial intelligence, and can be applied to any real-world competitive or cooperative setting.

Paper
Add Code

Ensemble Value Functions for Efficient Exploration in Multi-Agent Reinforcement Learning

no code implementations • 7 Feb 2023 • Lukas Schäfer, Oliver Slumbers, Stephen Mcaleer, Yali Du, Stefano V. Albrecht, David Mguni

In this work, we propose ensemble value functions for multi-agent exploration (EMAX), a general framework to seamlessly extend value-based MARL algorithms with ensembles of value functions.

Efficient Exploration Multi-agent Reinforcement Learning +2

Paper
Add Code

Game-Theoretic Robust Reinforcement Learning Handles Temporally-Coupled Perturbations

no code implementations • 22 Jul 2023 • Yongyuan Liang, Yanchao Sun, Ruijie Zheng, Xiangyu Liu, Benjamin Eysenbach, Tuomas Sandholm, Furong Huang, Stephen Mcaleer

To tackle this challenge, we propose GRAD, a novel game-theoretic approach that treats the temporally-coupled robust RL problem as a partially observable two-player zero-sum game.

Continuous Control reinforcement-learning +1

Paper
Add Code

JiangJun: Mastering Xiangqi by Tackling Non-Transitivity in Two-Player Zero-Sum Games

no code implementations • 9 Aug 2023 • Yang Li, Kun Xiong, Yingping Zhang, Jiangcheng Zhu, Stephen Mcaleer, Wei Pan, Jun Wang, Zonghong Dai, Yaodong Yang

This paper presents an empirical exploration of non-transitivity in perfect-information games, specifically focusing on Xiangqi, a traditional Chinese board game comparable in game-tree complexity to chess and shogi.

Paper
Add Code

AI Alignment: A Comprehensive Survey

no code implementations • 30 Oct 2023 • Jiaming Ji, Tianyi Qiu, Boyuan Chen, Borong Zhang, Hantao Lou, Kaile Wang, Yawen Duan, Zhonghao He, Jiayi Zhou, Zhaowei Zhang, Fanzhi Zeng, Kwan Yee Ng, Juntao Dai, Xuehai Pan, Aidan O'Gara, Yingshan Lei, Hua Xu, Brian Tse, Jie Fu, Stephen Mcaleer, Yaodong Yang, Yizhou Wang, Song-Chun Zhu, Yike Guo, Wen Gao

The former aims to make AI systems aligned via alignment training, while the latter aims to gain evidence about the systems' alignment and govern them appropriately to avoid exacerbating misalignment risks.

Paper
Add Code

Scalable Mechanism Design for Multi-Agent Path Finding

no code implementations • 30 Jan 2024 • Paul Friedrich, Yulun Zhang, Michael Curry, Ludwig Dierks, Stephen Mcaleer, Jiaoyang Li, Tuomas Sandholm, Sven Seuken

In this work, we introduce the problem of scalable mechanism design for MAPF and propose three strategyproof mechanisms, two of which even use approximate MAPF algorithms.

Multi-Agent Path Finding

Paper
Add Code

Policy Space Response Oracles: A Survey

no code implementations • 4 Mar 2024 • Ariyan Bighashdel, Yongzhao Wang, Stephen Mcaleer, Rahul Savani, Frans A. Oliehoek

In game theory, a game refers to a model of interaction among rational decision-makers or players, making choices with the goal of achieving their individual objectives.

Position

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.