Search Results for author: Ray Jiang

Found 11 papers, 5 papers with code

AlphaStar Unplugged: Large-Scale Offline Reinforcement Learning

1 code implementation • 7 Aug 2023 • Michaël Mathieu, Sherjil Ozair, Srivatsan Srinivasan, Caglar Gulcehre, Shangtong Zhang, Ray Jiang, Tom Le Paine, Richard Powell, Konrad Żołna, Julian Schrittwieser, David Choi, Petko Georgiev, Daniel Toyama, Aja Huang, Roman Ring, Igor Babuschkin, Timo Ewalds, Mahyar Bordbar, Sarah Henderson, Sergio Gómez Colmenarejo, Aäron van den Oord, Wojciech Marian Czarnecki, Nando de Freitas, Oriol Vinyals

StarCraft II is one of the most challenging simulated reinforcement learning environments; it is partially observable, stochastic, multi-agent, and mastering StarCraft II requires strategic planning over long time horizons with real-time low-level execution.

Offline RL reinforcement-learning +2

344

Paper
Code

Scaling Goal-based Exploration via Pruning Proto-goals

1 code implementation • 9 Feb 2023 • Akhil Bagaria, Ray Jiang, Ramana Kumar, Tom Schaul

One of the gnarliest challenges in reinforcement learning (RL) is exploration that scales to vast domains, where novelty-, or coverage-seeking behaviour falls short.

reinforcement-learning Reinforcement Learning (RL)

449

Paper
Code

Human-level Atari 200x faster

1 code implementation • 15 Sep 2022 • Steven Kapturowski, Víctor Campos, Ray Jiang, Nemanja Rakićević, Hado van Hasselt, Charles Blundell, Adrià Puigdomènech Badia

The task of building general agents that perform well over a wide range of tasks has been an importantgoal in reinforcement learning since its inception.

Paper
Code

Learning Expected Emphatic Traces for Deep RL

no code implementations • 12 Jul 2021 • Ray Jiang, Shangtong Zhang, Veronica Chelu, Adam White, Hado van Hasselt

We develop a multi-step emphatic weighting that can be combined with replay, and a time-reversed $n$-step TD learning algorithm to learn the required emphatic weighting.

Paper
Add Code

Emphatic Algorithms for Deep Reinforcement Learning

no code implementations • 21 Jun 2021 • Ray Jiang, Tom Zahavy, Zhongwen Xu, Adam White, Matteo Hessel, Charles Blundell, Hado van Hasselt

In this paper, we extend the use of emphatic methods to deep reinforcement learning agents.

Atari Games reinforcement-learning +1

Paper
Add Code

Causally Correct Partial Models for Reinforcement Learning

no code implementations • 7 Feb 2020 • Danilo J. Rezende, Ivo Danihelka, George Papamakarios, Nan Rosemary Ke, Ray Jiang, Theophane Weber, Karol Gregor, Hamza Merzic, Fabio Viola, Jane Wang, Jovana Mitrovic, Frederic Besse, Ioannis Antonoglou, Lars Buesing

In reinforcement learning, we can learn a model of future observations and rewards, and use it to plan the agent's next actions.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Reducing Sentiment Bias in Language Models via Counterfactual Evaluation

no code implementations • Findings of the Association for Computational Linguistics 2020 • Po-Sen Huang, huan zhang, Ray Jiang, Robert Stanforth, Johannes Welbl, Jack Rae, Vishal Maini, Dani Yogatama, Pushmeet Kohli

This paper aims to quantify and reduce a particular type of bias exhibited by language models: bias in the sentiment of generated text.

counterfactual Fairness +4

Paper
Add Code

Wasserstein Fair Classification

1 code implementation • 28 Jul 2019 • Ray Jiang, Aldo Pacchiano, Tom Stepleton, Heinrich Jiang, Silvia Chiappa

We propose an approach to fair classification that enforces independence between the classifier outputs and sensitive information by minimizing Wasserstein-1 distances.

Classification Fairness +1

Paper
Code

Degenerate Feedback Loops in Recommender Systems

no code implementations • 27 Feb 2019 • Ray Jiang, Silvia Chiappa, Tor Lattimore, András György, Pushmeet Kohli

Machine learning is used extensively in recommender systems deployed in products.

Recommendation Systems

Paper
Add Code

Learning from Delayed Outcomes via Proxies with Applications to Recommender Systems

no code implementations • 24 Jul 2018 • Timothy A. Mann, Sven Gowal, András György, Ray Jiang, Huiyi Hu, Balaji Lakshminarayanan, Prav Srinivasan

Predicting delayed outcomes is an important problem in recommender systems (e. g., if customers will finish reading an ebook).

Recommendation Systems

Paper
Add Code

Beyond Greedy Ranking: Slate Optimization via List-CVAE

1 code implementation • ICLR 2019 • Ray Jiang, Sven Gowal, Timothy A. Mann, Danilo J. Rezende

The conventional solution to the recommendation problem greedily ranks individual document candidates by prediction scores.

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.