Search Results for author: Dale Schuurmans

Found 107 papers, 26 papers with code

Trust-PCL: An Off-Policy Trust Region Method for Continuous Control

1 code implementation ICLR 2018 Ofir Nachum, Mohammad Norouzi, Kelvin Xu, Dale Schuurmans

When evaluated on a number of continuous control tasks, Trust-PCL improves the solution quality and sample efficiency of TRPO.

Continuous Control Reinforcement Learning (RL)

Bridging the Gap Between Value and Policy Based Reinforcement Learning

1 code implementation NeurIPS 2017 Ofir Nachum, Mohammad Norouzi, Kelvin Xu, Dale Schuurmans

We establish a new connection between value and policy based reinforcement learning (RL) based on a relationship between softmax temporal value consistency and policy optimality under entropy regularization.

Q-Learning reinforcement-learning +1

Energy-Based Processes for Exchangeable Data

1 code implementation ICML 2020 Mengjiao Yang, Bo Dai, Hanjun Dai, Dale Schuurmans

Recently there has been growing interest in modeling sets with exchangeability such as point clouds.

Denoising Point Cloud Generation

Scalable Deep Generative Modeling for Sparse Graphs

1 code implementation ICML 2020 Hanjun Dai, Azade Nazi, Yujia Li, Bo Dai, Dale Schuurmans

Based on this, we develop a novel autoregressive model, named BiGG, that utilizes this sparsity to avoid generating the full adjacency matrix, and importantly reduces the graph generation time complexity to $O((n + m)\log n)$.

Graph Generation

Combiner: Full Attention Transformer with Sparse Computation Cost

2 code implementations NeurIPS 2021 Hongyu Ren, Hanjun Dai, Zihang Dai, Mengjiao Yang, Jure Leskovec, Dale Schuurmans, Bo Dai

However, the key limitation of transformers is their quadratic memory and time complexity $\mathcal{O}(L^2)$ with respect to the sequence length in attention layers, which restricts application in extremely long sequences.

Image Generation Language Modelling

Chain of Thought Imitation with Procedure Cloning

1 code implementation22 May 2022 Mengjiao Yang, Dale Schuurmans, Pieter Abbeel, Ofir Nachum

Imitation learning aims to extract high-performance policies from logged demonstrations of expert behavior.

Imitation Learning Robot Manipulation

Dichotomy of Control: Separating What You Can Control from What You Cannot

1 code implementation24 Oct 2022 Mengjiao Yang, Dale Schuurmans, Pieter Abbeel, Ofir Nachum

While return-conditioning is at the heart of popular algorithms such as decision transformer (DT), these methods tend to perform poorly in highly stochastic environments, where an occasional high return can arise from randomness in the environment rather than the actions themselves.

Reinforcement Learning (RL)

Learning to Generalize from Sparse and Underspecified Rewards

1 code implementation19 Feb 2019 Rishabh Agarwal, Chen Liang, Dale Schuurmans, Mohammad Norouzi

The parameters of the auxiliary reward function are optimized with respect to the validation performance of a trained policy.

Bayesian Optimization Semantic Parsing

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

14 code implementations28 Jan 2022 Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, Denny Zhou

We explore how generating a chain of thought -- a series of intermediate reasoning steps -- significantly improves the ability of large language models to perform complex reasoning.

Common Sense Reasoning GSM8K +2

Self-Consistency Improves Chain of Thought Reasoning in Language Models

1 code implementation21 Mar 2022 Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Sharan Narang, Aakanksha Chowdhery, Denny Zhou

Chain-of-thought prompting combined with pre-trained large language models has achieved encouraging results on complex reasoning tasks.

Ranked #78 on Arithmetic Reasoning on GSM8K (using extra training data)

Arithmetic Reasoning GSM8K +3

SMORE: Knowledge Graph Completion and Multi-hop Reasoning in Massive Knowledge Graphs

1 code implementation28 Oct 2021 Hongyu Ren, Hanjun Dai, Bo Dai, Xinyun Chen, Denny Zhou, Jure Leskovec, Dale Schuurmans

There are two important reasoning tasks on KGs: (1) single-hop knowledge graph completion, which involves predicting individual links in the KG; and (2), multi-hop reasoning, where the goal is to predict which KG entities satisfy a given logical query.

Scheduling

GenDICE: Generalized Offline Estimation of Stationary Values

1 code implementation ICLR 2020 Ruiyi Zhang, Bo Dai, Lihong Li, Dale Schuurmans

An important problem that arises in reinforcement learning and Monte Carlo methods is estimating quantities defined by the stationary distribution of a Markov chain.

Offline Policy Selection under Uncertainty

1 code implementation12 Dec 2020 Mengjiao Yang, Bo Dai, Ofir Nachum, George Tucker, Dale Schuurmans

More importantly, we show how the belief distribution estimated by BayesDICE may be used to rank policies with respect to any arbitrary downstream policy selection metric, and we empirically demonstrate that this selection procedure significantly outperforms existing approaches, such as ranking policies according to mean or high-confidence lower bound value estimates.

Multimodal Masked Autoencoders Learn Transferable Representations

1 code implementation27 May 2022 Xinyang Geng, Hao liu, Lisa Lee, Dale Schuurmans, Sergey Levine, Pieter Abbeel

We provide an empirical study of M3AE trained on a large-scale image-text dataset, and find that M3AE is able to learn generalizable representations that transfer well to downstream tasks.

Contrastive Learning

TEMPERA: Test-Time Prompting via Reinforcement Learning

1 code implementation21 Nov 2022 Tianjun Zhang, Xuezhi Wang, Denny Zhou, Dale Schuurmans, Joseph E. Gonzalez

To achieve this, we design a novel action space that allows flexible editing of the initial prompts covering a wide set of commonly-used components like instructions, few-shot exemplars, and verbalizers.

Few-Shot Learning Natural Language Inference +5

Exponential Family Estimation via Adversarial Dynamics Embedding

1 code implementation NeurIPS 2019 Bo Dai, Zhen Liu, Hanjun Dai, Niao He, Arthur Gretton, Le Song, Dale Schuurmans

We present an efficient algorithm for maximum likelihood estimation (MLE) of exponential family models, with a general parametrization of the energy function that includes neural networks.

Optimal Scaling for Locally Balanced Proposals in Discrete Spaces

1 code implementation16 Sep 2022 Haoran Sun, Hanjun Dai, Dale Schuurmans

Optimal scaling has been well studied for Metropolis-Hastings (M-H) algorithms in continuous spaces, but a similar understanding has been lacking in discrete spaces.

Kernel Exponential Family Estimation via Doubly Dual Embedding

1 code implementation6 Nov 2018 Bo Dai, Hanjun Dai, Arthur Gretton, Le Song, Dale Schuurmans, Niao He

We investigate penalized maximum log-likelihood estimation for exponential family distributions whose natural parameter resides in a reproducing kernel Hilbert space.

Learning with a Strong Adversary

1 code implementation10 Nov 2015 Ruitong Huang, Bing Xu, Dale Schuurmans, Csaba Szepesvari

The robustness of neural networks to intended perturbations has recently attracted significant attention.

General Classification

Invertible Convolutional Flow

1 code implementation NeurIPS 2019 Mahdi Karami, Dale Schuurmans, Jascha Sohl-Dickstein, Laurent Dinh, Daniel Duckworth

We show that these transforms allow more effective normalizing flow models to be developed for generative image models.

A Simple Decentralized Cross-Entropy Method

1 code implementation16 Dec 2022 Zichen Zhang, Jun Jin, Martin Jagersand, Jun Luo, Dale Schuurmans

To tackle this issue, we propose Decentralized CEM (DecentCEM), a simple but effective improvement over classical CEM, by using an ensemble of CEM instances running independently from one another, and each performing a local improvement of its own sampling distribution.

Continuous Control Model-based Reinforcement Learning

Batch Stationary Distribution Estimation

1 code implementation ICML 2020 Junfeng Wen, Bo Dai, Lihong Li, Dale Schuurmans

We consider the problem of approximating the stationary distribution of an ergodic Markov chain given a set of sampled transitions.

Off-policy evaluation

Smoothed Action Value Functions for Learning Gaussian Policies

no code implementations ICML 2018 Ofir Nachum, Mohammad Norouzi, George Tucker, Dale Schuurmans

State-action value functions (i. e., Q-values) are ubiquitous in reinforcement learning (RL), giving rise to popular algorithms such as SARSA and Q-learning.

Continuous Control Q-Learning +1

Variational Rejection Sampling

no code implementations5 Apr 2018 Aditya Grover, Ramki Gummadi, Miguel Lazaro-Gredilla, Dale Schuurmans, Stefano Ermon

Learning latent variable models with stochastic variational inference is challenging when the approximate posterior is far from the true posterior, due to high variance in the gradient estimates.

Variational Inference

Safe Exploration for Identifying Linear Systems via Robust Optimization

no code implementations30 Nov 2017 Tyler Lu, Martin Zinkevich, Craig Boutilier, Binz Roy, Dale Schuurmans

Motivated by the cooling of Google's data centers, we study how one can safely identify the parameters of a system model with a desired accuracy and confidence level.

Reinforcement Learning (RL) Safe Exploration

Improving Policy Gradient by Exploring Under-appreciated Rewards

no code implementations28 Nov 2016 Ofir Nachum, Mohammad Norouzi, Dale Schuurmans

We propose a more directed exploration strategy that promotes exploration of under-appreciated reward regions.

Reinforcement Learning (RL)

Stochastic Neural Networks with Monotonic Activation Functions

no code implementations1 Jan 2016 Siamak Ravanbakhsh, Barnabas Poczos, Jeff Schneider, Dale Schuurmans, Russell Greiner

We propose a Laplace approximation that creates a stochastic unit from any smooth monotonic activation function, using only Gaussian noise.

Generalized Conditional Gradient for Sparse Estimation

no code implementations17 Oct 2014 Yao-Liang Yu, Xinhua Zhang, Dale Schuurmans

Structured sparsity is an important modeling tool that expands the applicability of convex formulations for data analysis, however it also creates significant challenges for efficient algorithm design.

Dictionary Learning Matrix Completion +1

Adaptive Monte Carlo via Bandit Allocation

no code implementations13 May 2014 James Neufeld, András György, Dale Schuurmans, Csaba Szepesvári

We consider the problem of sequentially choosing between a set of unbiased Monte Carlo estimators to minimize the mean-squared-error (MSE) of a final combined estimate.

Convex Relaxations of Bregman Divergence Clustering

no code implementations26 Sep 2013 Hao Cheng, Xinhua Zhang, Dale Schuurmans

Although many convex relaxations of clustering have been proposed in the past decade, current formulations remain restricted to spherical Gaussian or discriminative models and are susceptible to imbalanced clusters.

Clustering

Non-delusional Q-learning and value-iteration

no code implementations NeurIPS 2018 Tyler Lu, Dale Schuurmans, Craig Boutilier

We identify a fundamental source of error in Q-learning and other forms of dynamic programming with function approximation.

Q-Learning

Multi-view Matrix Factorization for Linear Dynamical System Estimation

no code implementations NeurIPS 2017 Mahdi Karami, Martha White, Dale Schuurmans, Csaba Szepesvari

In this paper, we instead reconsider likelihood maximization and develop an optimization based strategy for recovering the latent states and transition parameters.

Deep Learning Games

no code implementations NeurIPS 2016 Dale Schuurmans, Martin A. Zinkevich

We investigate a reduction of supervised learning to game playing that reveals new connections and learning methods.

Embedding Inference for Structured Multilabel Prediction

no code implementations NeurIPS 2015 Farzaneh Mirzazadeh, Siamak Ravanbakhsh, Nan Ding, Dale Schuurmans

A key bottleneck in structured output prediction is the need for inference during training and testing, usually requiring some form of dynamic programming.

Convex Deep Learning via Normalized Kernels

no code implementations NeurIPS 2014 Özlem Aslan, Xinhua Zhang, Dale Schuurmans

Deep learning has been a long standing pursuit in machine learning, which until recently was hampered by unreliable training methods before the discovery of improved heuristics for embedded layer training.

Polar Operators for Structured Sparse Estimation

no code implementations NeurIPS 2013 Xinhua Zhang, Yao-Liang Yu, Dale Schuurmans

Structured sparse estimation has become an important technique in many areas of data analysis.

Convex Two-Layer Modeling

no code implementations NeurIPS 2013 Özlem Aslan, Hao Cheng, Xinhua Zhang, Dale Schuurmans

Latent variable prediction models, such as multi-layer networks, impose auxiliary latent variables between inputs and outputs to allow automatic inference of implicit features useful for prediction.

Vocal Bursts Valence Prediction

Convex Multi-view Subspace Learning

no code implementations NeurIPS 2012 Martha White, Xinhua Zhang, Dale Schuurmans, Yao-Liang Yu

Subspace learning seeks a low dimensional representation of data that enables accurate reconstruction.

A Polynomial-time Form of Robust Regression

no code implementations NeurIPS 2012 Yao-Liang Yu, Özlem Aslan, Dale Schuurmans

Despite the variety of robust regression methods that have been developed, current regression formulations are either NP-hard, or allow unbounded response to even a single leverage point.

regression

Relaxed Clipping: A Global Training Method for Robust Regression and Classification

no code implementations NeurIPS 2010 Min Yang, Linli Xu, Martha White, Dale Schuurmans, Yao-Liang Yu

We present a generic procedure that can be applied to standard loss functions and demonstrate improved robustness in regression and classification problems.

Classification General Classification +1

A General Projection Property for Distribution Families

no code implementations NeurIPS 2009 Yao-Liang Yu, Yuxi Li, Dale Schuurmans, Csaba Szepesvári

We prove that linear projections between distribution families with fixed first and second moments are surjective, regardless of dimension.

Discriminative Batch Mode Active Learning

no code implementations NeurIPS 2007 Yuhong Guo, Dale Schuurmans

Most previous studies in active learning have focused on selecting one unlabeled instance at one time while retraining in each iteration.

Active Learning

Semi-Supervised Zero-Shot Classification With Label Representation Learning

no code implementations ICCV 2015 Xin Li, Yuhong Guo, Dale Schuurmans

Most existing zero-shot learning methods require a user to first provide a set of semantic visual attributes for each class as side information before applying a two-step prediction procedure that introduces an intermediate attribute prediction problem.

Attribute Classification +4

The Value Function Polytope in Reinforcement Learning

no code implementations31 Jan 2019 Robert Dadashi, Adrien Ali Taïga, Nicolas Le Roux, Dale Schuurmans, Marc G. Bellemare

We establish geometric and topological properties of the space of value functions in finite state-action Markov decision processes.

reinforcement-learning Reinforcement Learning (RL)

Advantage Amplification in Slowly Evolving Latent-State Environments

no code implementations29 May 2019 Martin Mladenov, Ofer Meshi, Jayden Ooi, Dale Schuurmans, Craig Boutilier

Latent-state environments with long horizons, such as those faced by recommender systems, pose significant challenges for reinforcement learning (RL).

Recommendation Systems reinforcement-learning +1

Domain Aggregation Networks for Multi-Source Domain Adaptation

no code implementations ICML 2020 Junfeng Wen, Russell Greiner, Dale Schuurmans

In many real-world applications, we want to exploit multiple source datasets of similar tasks to learn a model for a different but related target dataset -- e. g., recognizing characters of a new font using a set of different fonts.

Domain Adaptation Sentiment Analysis

Surrogate Objectives for Batch Policy Optimization in One-step Decision Making

no code implementations NeurIPS 2019 Minmin Chen, Ramki Gummadi, Chris Harris, Dale Schuurmans

We investigate batch policy optimization for cost-sensitive classification and contextual bandits---two related tasks that obviate exploration but require generalizing from observed rewards to action selections in unseen contexts.

Decision Making Multi-Armed Bandits

Maximum Entropy Monte-Carlo Planning

no code implementations NeurIPS 2019 Chenjun Xiao, Ruitong Huang, Jincheng Mei, Dale Schuurmans, Martin Müller

We then extend this approach to general sequential decision making by developing a general MCTS algorithm, Maximum Entropy for Tree Search (MENTS).

Atari Games Decision Making

AlgaeDICE: Policy Gradient from Arbitrary Experience

no code implementations4 Dec 2019 Ofir Nachum, Bo Dai, Ilya Kostrikov, Yin-Lam Chow, Lihong Li, Dale Schuurmans

In many real-world applications of reinforcement learning (RL), interactions with the environment are limited due to cost or feasibility.

Reinforcement Learning (RL)

Learning to Combat Compounding-Error in Model-Based Reinforcement Learning

no code implementations24 Dec 2019 Chenjun Xiao, Yifan Wu, Chen Ma, Dale Schuurmans, Martin Müller

Despite its potential to improve sample complexity versus model-free approaches, model-based reinforcement learning can fail catastrophically if the model is inaccurate.

Model-based Reinforcement Learning reinforcement-learning +1

Variational Inference for Deep Probabilistic Canonical Correlation Analysis

no code implementations9 Mar 2020 Mahdi Karami, Dale Schuurmans

In this paper, we propose a deep probabilistic multi-view model that is composed of a linear multi-view layer based on probabilistic canonical correlation analysis (CCA) description in the latent space together with deep generative networks as observation models.

MULTI-VIEW LEARNING Variational Inference

On the Global Convergence Rates of Softmax Policy Gradient Methods

no code implementations ICML 2020 Jincheng Mei, Chenjun Xiao, Csaba Szepesvari, Dale Schuurmans

First, we show that with the true gradient, policy gradient with a softmax parametrization converges at a $O(1/t)$ rate, with constants depending on the problem and initialization.

Open-Ended Question Answering Policy Gradient Methods

A maximum-entropy approach to off-policy evaluation in average-reward MDPs

no code implementations NeurIPS 2020 Nevena Lazic, Dong Yin, Mehrdad Farajtabar, Nir Levine, Dilan Gorur, Chris Harris, Dale Schuurmans

This work focuses on off-policy evaluation (OPE) with function approximation in infinite-horizon undiscounted Markov decision processes (MDPs).

Off-policy evaluation

Go Wide, Then Narrow: Efficient Training of Deep Thin Networks

no code implementations ICML 2020 Denny Zhou, Mao Ye, Chen Chen, Tianjian Meng, Mingxing Tan, Xiaodan Song, Quoc Le, Qiang Liu, Dale Schuurmans

This is achieved by layerwise imitation, that is, forcing the thin network to mimic the intermediate outputs of the wide network from layer to layer.

Computational Efficiency Model Compression

Off-Policy Evaluation via the Regularized Lagrangian

no code implementations NeurIPS 2020 Mengjiao Yang, Ofir Nachum, Bo Dai, Lihong Li, Dale Schuurmans

The recently proposed distribution correction estimation (DICE) family of estimators has advanced the state of the art in off-policy evaluation from behavior-agnostic data.

Off-policy evaluation

EMaQ: Expected-Max Q-Learning Operator for Simple Yet Effective Offline and Online RL

no code implementations21 Jul 2020 Seyed Kamyar Seyed Ghasemipour, Dale Schuurmans, Shixiang Shane Gu

In this work, we closely investigate an important simplification of BCQ -- a prior approach for offline RL -- which removes a heuristic design choice and naturally restricts extracted policies to remain exactly within the support of a given behavior policy.

D4RL Decision Making +2

Attention that does not Explain Away

no code implementations29 Sep 2020 Nan Ding, Xinjie Fan, Zhenzhong Lan, Dale Schuurmans, Radu Soricut

Models based on the Transformer architecture have achieved better accuracy than the ones based on competing architectures for a large set of tasks.

CoinDICE: Off-Policy Confidence Interval Estimation

no code implementations NeurIPS 2020 Bo Dai, Ofir Nachum, Yinlam Chow, Lihong Li, Csaba Szepesvári, Dale Schuurmans

We study high-confidence behavior-agnostic off-policy evaluation in reinforcement learning, where the goal is to estimate a confidence interval on a target policy's value, given only access to a static experience dataset collected by unknown behavior policies.

Off-policy evaluation valid

Learning Discrete Energy-based Models via Auxiliary-variable Local Exploration

no code implementations NeurIPS 2020 Hanjun Dai, Rishabh Singh, Bo Dai, Charles Sutton, Dale Schuurmans

In this paper we propose ALOE, a new algorithm for learning conditional and unconditional EBMs for discrete structured data, where parameter gradients are estimated using a learned sampler that mimics local search.

Language Modelling

Escaping the Gravitational Pull of Softmax

no code implementations NeurIPS 2020 Jincheng Mei, Chenjun Xiao, Bo Dai, Lihong Li, Csaba Szepesvari, Dale Schuurmans

Both findings are based on an analysis of convergence rates using the Non-uniform \L{}ojasiewicz (N\L{}) inequalities.

Optimization Issues in KL-Constrained Approximate Policy Iteration

no code implementations11 Feb 2021 Nevena Lazić, Botao Hao, Yasin Abbasi-Yadkori, Dale Schuurmans, Csaba Szepesvári

We compare the use of KL divergence as a constraint vs. as a regularizer, and point out several optimization issues with the widely-used constrained approach.

On the Optimality of Batch Policy Optimization Algorithms

no code implementations6 Apr 2021 Chenjun Xiao, Yifan Wu, Tor Lattimore, Bo Dai, Jincheng Mei, Lihong Li, Csaba Szepesvari, Dale Schuurmans

First, we introduce a class of confidence-adjusted index algorithms that unifies optimistic and pessimistic principles in a common framework, which enables a general analysis.

Value prediction

Joint Attention for Multi-Agent Coordination and Social Learning

no code implementations15 Apr 2021 Dennis Lee, Natasha Jaques, Chase Kew, Jiaxing Wu, Douglas Eck, Dale Schuurmans, Aleksandra Faust

We then train agents to minimize the difference between the attention weights that they apply to the environment at each timestep, and the attention of other agents.

Inductive Bias Reinforcement Learning (RL)

Leveraging Non-uniformity in First-order Non-convex Optimization

no code implementations13 May 2021 Jincheng Mei, Yue Gao, Bo Dai, Csaba Szepesvari, Dale Schuurmans

Classical global convergence results for first-order methods rely on uniform smoothness and the \L{}ojasiewicz inequality.

BIG-bench Machine Learning

The Curse of Passive Data Collection in Batch Reinforcement Learning

no code implementations18 Jun 2021 Chenjun Xiao, Ilbin Lee, Bo Dai, Dale Schuurmans, Csaba Szepesvari

In high stake applications, active experimentation may be considered too risky and thus data are often collected passively.

reinforcement-learning Reinforcement Learning (RL)

Decentralized Cross-Entropy Method for Model-Based Reinforcement Learning

no code implementations29 Sep 2021 Zichen Zhang, Jun Jin, Martin Jagersand, Jun Luo, Dale Schuurmans

Further, we extend the decentralized approach to sequential decision-making problems where we show in 13 continuous control benchmark environments that it matches or outperforms the state-of-the-art CEM algorithms in most cases, under the same budget of the total number of samples for planning.

Continuous Control Decision Making +3

Disentangling Generalization in Reinforcement Learning

no code implementations29 Sep 2021 Alex Lewandowski, Dale Schuurmans, Jun Luo

The resulting environment, while simple, necessitates function approximation for state abstraction and provides ground-truth labels for optimal policies and value functions.

reinforcement-learning Reinforcement Learning (RL)

Understanding and Leveraging Overparameterization in Recursive Value Estimation

no code implementations ICLR 2022 Chenjun Xiao, Bo Dai, Jincheng Mei, Oscar A Ramirez, Ramki Gummadi, Chris Harris, Dale Schuurmans

To better understand the utility of deep models in RL we present an analysis of recursive value estimation using overparameterized linear representations that provides useful, transferable findings.

Reinforcement Learning (RL) Value prediction

Understanding the Effect of Stochasticity in Policy Optimization

no code implementations NeurIPS 2021 Jincheng Mei, Bo Dai, Chenjun Xiao, Csaba Szepesvari, Dale Schuurmans

We study the effect of stochasticity in on-policy policy optimization, and make the following four contributions.

Striving for Simplicity in Off-Policy Deep Reinforcement Learning

no code implementations25 Sep 2019 Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi

This paper advocates the use of offline (batch) reinforcement learning (RL) to help (1) isolate the contributions of exploitation vs. exploration in off-policy deep RL, (2) improve reproducibility of deep RL research, and (3) facilitate the design of simpler deep RL algorithms.

Atari Games Offline RL +3

Neural Stochastic Dual Dynamic Programming

no code implementations ICLR 2022 Hanjun Dai, Yuan Xue, Zia Syed, Dale Schuurmans, Bo Dai

Stochastic dual dynamic programming (SDDP) is a state-of-the-art method for solving multi-stage stochastic optimization, widely used for modeling real-world process optimization tasks.

Stochastic Optimization

Reinforcement Teaching

no code implementations25 Apr 2022 Alex Lewandowski, Calarina Muslimani, Dale Schuurmans, Matthew E. Taylor, Jun Luo

To effectively learn such a teaching policy, we introduce a parametric-behavior embedder that learns a representation of the student's learnable parameters from its input/output behavior.

Meta-Learning

A Parametric Class of Approximate Gradient Updates for Policy Optimization

no code implementations17 Jun 2022 Ramki Gummadi, Saurabh Kumar, Junfeng Wen, Dale Schuurmans

Approaches to policy optimization have been motivated from diverse principles, based on how the parametric model is interpreted (e. g. value versus policy representation) or how the learning objective is formulated, yet they share a common goal of maximizing expected return.

Discrete Langevin Sampler via Wasserstein Gradient Flow

no code implementations29 Jun 2022 Haoran Sun, Hanjun Dai, Bo Dai, Haomin Zhou, Dale Schuurmans

It is known that gradient-based MCMC samplers for continuous spaces, such as Langevin Monte Carlo (LMC), can be derived as particle versions of a gradient flow that minimizes KL divergence on a Wasserstein manifold.

Rationale-Augmented Ensembles in Language Models

no code implementations2 Jul 2022 Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Denny Zhou

Recent research has shown that rationales, or step-by-step chains of thought, can be used to improve performance in multi-step reasoning tasks.

In-Context Learning Prompt Engineering +3

Making Linear MDPs Practical via Contrastive Representation Learning

no code implementations14 Jul 2022 Tianjun Zhang, Tongzheng Ren, Mengjiao Yang, Joseph E. Gonzalez, Dale Schuurmans, Bo Dai

It is common to address the curse of dimensionality in Markov decision processes (MDPs) by exploiting low-rank representations.

Representation Learning

What learning algorithm is in-context learning? Investigations with linear models

no code implementations28 Nov 2022 Ekin Akyürek, Dale Schuurmans, Jacob Andreas, Tengyu Ma, Denny Zhou

We investigate the hypothesis that transformer-based in-context learners implement standard learning algorithms implicitly, by encoding smaller models in their activations, and updating these implicit models as new examples appear in the context.

In-Context Learning regression

Score-based Continuous-time Discrete Diffusion Models

no code implementations30 Nov 2022 Haoran Sun, Lijun Yu, Bo Dai, Dale Schuurmans, Hanjun Dai

Score-based modeling through stochastic differential equations (SDEs) has provided a new perspective on diffusion models, and demonstrated superior performance on continuous data.

Memory Augmented Large Language Models are Computationally Universal

no code implementations10 Jan 2023 Dale Schuurmans

We show that transformer-based large language models are computationally universal when augmented with an external memory.

Language Modelling Large Language Model

The Role of Baselines in Policy Gradient Optimization

no code implementations16 Jan 2023 Jincheng Mei, Wesley Chung, Valentin Thomas, Bo Dai, Csaba Szepesvari, Dale Schuurmans

Instead, the analysis reveals that the primary effect of the value baseline is to \textbf{reduce the aggressiveness of the updates} rather than their variance.

Foundation Models for Decision Making: Problems, Methods, and Opportunities

no code implementations7 Mar 2023 Sherry Yang, Ofir Nachum, Yilun Du, Jason Wei, Pieter Abbeel, Dale Schuurmans

In response to these developments, new paradigms are emerging for training foundation models to interact with other agents and perform long-term reasoning.

Autonomous Driving Decision Making +1

Gradient-Free Structured Pruning with Unlabeled Data

no code implementations7 Mar 2023 Azade Nova, Hanjun Dai, Dale Schuurmans

By only using the weights of the pre-trained model and unlabeled data, in a matter of a few minutes on a single GPU, up to 40% of the original FLOP count can be reduced with less than a 4% accuracy loss across all tasks considered.

Model Compression

Probabilistic Adaptation of Text-to-Video Models

no code implementations2 Jun 2023 Mengjiao Yang, Yilun Du, Bo Dai, Dale Schuurmans, Joshua B. Tenenbaum, Pieter Abbeel

Large text-to-video models trained on internet-scale data have demonstrated exceptional capabilities in generating high-fidelity videos from arbitrary textual descriptions.

Language Modelling Large Language Model

Learning Interactive Real-World Simulators

no code implementations9 Oct 2023 Mengjiao Yang, Yilun Du, Kamyar Ghasemipour, Jonathan Tompson, Leslie Kaelbling, Dale Schuurmans, Pieter Abbeel

Applications of a real-world simulator range from controllable content creation in games and movies, to training embodied agents purely in simulation that can be directly deployed in the real world.

Video Captioning

Large Language Models can Learn Rules

no code implementations10 Oct 2023 Zhaocheng Zhu, Yuan Xue, Xinyun Chen, Denny Zhou, Jian Tang, Dale Schuurmans, Hanjun Dai

In the deduction stage, the LLM is then prompted to employ the learned rule library to perform reasoning to answer test questions.

Relational Reasoning

Scalable Diffusion for Materials Generation

no code implementations18 Oct 2023 Mengjiao Yang, KwangHwan Cho, Amil Merchant, Pieter Abbeel, Dale Schuurmans, Igor Mordatch, Ekin Dogus Cubuk

Lastly, we show that conditional generation with UniMat can scale to previously established crystal datasets with up to millions of crystals structures, outperforming random structure search (the current leading method for structure discovery) in discovering new stable materials.

Formation Energy

Efficient Reinforcement Learning from Partial Observability

no code implementations20 Nov 2023 Hongming Zhang, Tongzheng Ren, Chenjun Xiao, Dale Schuurmans, Bo Dai

In most real-world reinforcement learning applications, state information is only partially observable, which breaks the Markov decision process assumption and leads to inferior performance for algorithms that conflate observations with state.

Partially Observable Reinforcement Learning reinforcement-learning

Directions of Curvature as an Explanation for Loss of Plasticity

no code implementations30 Nov 2023 Alex Lewandowski, Haruto Tanaka, Dale Schuurmans, Marlos C. Machado

Loss of plasticity is a phenomenon in which neural networks lose their ability to learn from new experience.

Continual Learning

Beyond Expectations: Learning with Stochastic Dominance Made Practical

no code implementations5 Feb 2024 Shicong Cen, Jincheng Mei, Hanjun Dai, Dale Schuurmans, Yuejie Chi, Bo Dai

Stochastic dominance models risk-averse preferences for decision making with uncertain outcomes, which naturally captures the intrinsic structure of the underlying uncertainty, in contrast to simply resorting to the expectations.

Decision Making Portfolio Optimization

Video as the New Language for Real-World Decision Making

no code implementations27 Feb 2024 Sherry Yang, Jacob Walker, Jack Parker-Holder, Yilun Du, Jake Bruce, Andre Barreto, Pieter Abbeel, Dale Schuurmans

Moreover, we demonstrate how, like language models, video generation can serve as planners, agents, compute engines, and environment simulators through techniques such as in-context learning, planning and reinforcement learning.

Decision Making In-Context Learning +2

Stochastic Gradient Succeeds for Bandits

no code implementations27 Feb 2024 Jincheng Mei, Zixin Zhong, Bo Dai, Alekh Agarwal, Csaba Szepesvari, Dale Schuurmans

We show that the \emph{stochastic gradient} bandit algorithm converges to a \emph{globally optimal} policy at an $O(1/t)$ rate, even with a \emph{constant} step size.

Cannot find the paper you are looking for? You can Submit a new open access paper.