1 code implementation • ICML 2020 • Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi

The DQN replay dataset can serve as an offline RL benchmark and is open-sourced.

no code implementations • 18 Jun 2024 • Bernd Bohnet, Azade Nova, Aaron T Parisi, Kevin Swersky, Katayoon Goshvadi, Hanjun Dai, Dale Schuurmans, Noah Fiedel, Hanie Sedghi

We seek to elevate the planning capabilities of Large Language Models (LLMs)investigating four main directions.

no code implementations • 10 Jun 2024 • Alex Lewandowski, Saurabh Kumar, Dale Schuurmans, András György, Marlos C. Machado

From this perspective, we derive new regularization strategies for continual learning that ensure beneficial initialization properties are better maintained throughout training.

no code implementations • 31 May 2024 • Fengdi Che, Chenjun Xiao, Jincheng Mei, Bo Dai, Ramki Gummadi, Oscar A Ramirez, Christopher K Harris, A. Rupam Mahmood, Dale Schuurmans

We prove that the combination of a target network and over-parameterized linear function approximation establishes a weaker convergence condition for bootstrapped value estimation in certain cases, even with off-policy data.

no code implementations • 29 May 2024 • Shicong Cen, Jincheng Mei, Katayoon Goshvadi, Hanjun Dai, Tong Yang, Sherry Yang, Dale Schuurmans, Yuejie Chi, Bo Dai

A key bottleneck is understanding how to incorporate uncertainty estimation in the reward function learned from the preference data for RLHF, regardless of how the preference data is collected.

no code implementations • 30 Apr 2024 • Arsalan SharifNassab, Sina Ghiassian, Saber Salehkaleybar, Surya Kanoria, Dale Schuurmans

We propose Soft Preference Optimization (SPO), a method for aligning generative models, such as Large Language Models (LLMs), with human preferences, without the need for a reward model.

no code implementations • 27 Feb 2024 • Jincheng Mei, Zixin Zhong, Bo Dai, Alekh Agarwal, Csaba Szepesvari, Dale Schuurmans

We show that the \emph{stochastic gradient} bandit algorithm converges to a \emph{globally optimal} policy at an $O(1/t)$ rate, even with a \emph{constant} step size.

no code implementations • 27 Feb 2024 • Sherry Yang, Jacob Walker, Jack Parker-Holder, Yilun Du, Jake Bruce, Andre Barreto, Pieter Abbeel, Dale Schuurmans

Moreover, we demonstrate how, like language models, video generation can serve as planners, agents, compute engines, and environment simulators through techniques such as in-context learning, planning and reinforcement learning.

no code implementations • 5 Feb 2024 • Shicong Cen, Jincheng Mei, Hanjun Dai, Dale Schuurmans, Yuejie Chi, Bo Dai

Stochastic dominance models risk-averse preferences for decision making with uncertain outcomes, which naturally captures the intrinsic structure of the underlying uncertainty, in contrast to simply resorting to the expectations.

no code implementations • 30 Nov 2023 • Alex Lewandowski, Haruto Tanaka, Dale Schuurmans, Marlos C. Machado

Loss of plasticity is a phenomenon in which neural networks lose their ability to learn from new experience.

no code implementations • 20 Nov 2023 • Hongming Zhang, Tongzheng Ren, Chenjun Xiao, Dale Schuurmans, Bo Dai

In most real-world reinforcement learning applications, state information is only partially observable, which breaks the Markov decision process assumption and leads to inferior performance for algorithms that conflate observations with state.

Partially Observable Reinforcement Learning reinforcement-learning

no code implementations • 18 Oct 2023 • Sherry Yang, KwangHwan Cho, Amil Merchant, Pieter Abbeel, Dale Schuurmans, Igor Mordatch, Ekin Dogus Cubuk

Lastly, we show that conditional generation with UniMat can scale to previously established crystal datasets with up to millions of crystals structures, outperforming random structure search (the current leading method for structure discovery) in discovering new stable materials.

no code implementations • 10 Oct 2023 • Zhaocheng Zhu, Yuan Xue, Xinyun Chen, Denny Zhou, Jian Tang, Dale Schuurmans, Hanjun Dai

In the deduction stage, the LLM is then prompted to employ the learned rule library to perform reasoning to answer test questions.

no code implementations • 9 Oct 2023 • Mengjiao Yang, Yilun Du, Kamyar Ghasemipour, Jonathan Tompson, Leslie Kaelbling, Dale Schuurmans, Pieter Abbeel

Applications of a real-world simulator range from controllable content creation in games and movies, to training embodied agents purely in simulation that can be directly deployed in the real world.

no code implementations • 2 Jun 2023 • Mengjiao Yang, Yilun Du, Bo Dai, Dale Schuurmans, Joshua B. Tenenbaum, Pieter Abbeel

Large text-to-video models trained on internet-scale data have demonstrated exceptional capabilities in generating high-fidelity videos from arbitrary textual descriptions.

no code implementations • 7 Mar 2023 • Sherry Yang, Ofir Nachum, Yilun Du, Jason Wei, Pieter Abbeel, Dale Schuurmans

In response to these developments, new paradigms are emerging for training foundation models to interact with other agents and perform long-term reasoning.

no code implementations • 7 Mar 2023 • Azade Nova, Hanjun Dai, Dale Schuurmans

By only using the weights of the pre-trained model and unlabeled data, in a matter of a few minutes on a single GPU, up to 40% of the original FLOP count can be reduced with less than a 4% accuracy loss across all tasks considered.

no code implementations • 16 Jan 2023 • Jincheng Mei, Wesley Chung, Valentin Thomas, Bo Dai, Csaba Szepesvari, Dale Schuurmans

Instead, the analysis reveals that the primary effect of the value baseline is to \textbf{reduce the aggressiveness of the updates} rather than their variance.

no code implementations • 10 Jan 2023 • Dale Schuurmans

We show that transformer-based large language models are computationally universal when augmented with an external memory.

no code implementations • 17 Dec 2022 • Tongzheng Ren, Chenjun Xiao, Tianjun Zhang, Na Li, Zhaoran Wang, Sujay Sanghavi, Dale Schuurmans, Bo Dai

Theoretically, we establish the sample complexity of the proposed approach in the online and offline settings.

Model-based Reinforcement Learning
reinforcement-learning
**+1**

no code implementations • NeurIPS 2023 • Zichen Zhang, Johannes Kirschner, Junxi Zhang, Francesco Zanini, Alex Ayoub, Masood Dehghan, Dale Schuurmans

A default assumption in reinforcement learning (RL) and optimal control is that observations arrive at discrete time points on a fixed clock cycle.

1 code implementation • 16 Dec 2022 • Zichen Zhang, Jun Jin, Martin Jagersand, Jun Luo, Dale Schuurmans

To tackle this issue, we propose Decentralized CEM (DecentCEM), a simple but effective improvement over classical CEM, by using an ensemble of CEM instances running independently from one another, and each performing a local improvement of its own sampling distribution.

no code implementations • 30 Nov 2022 • Haoran Sun, Lijun Yu, Bo Dai, Dale Schuurmans, Hanjun Dai

Score-based modeling through stochastic differential equations (SDEs) has provided a new perspective on diffusion models, and demonstrated superior performance on continuous data.

no code implementations • 28 Nov 2022 • Ekin Akyürek, Dale Schuurmans, Jacob Andreas, Tengyu Ma, Denny Zhou

We investigate the hypothesis that transformer-based in-context learners implement standard learning algorithms implicitly, by encoding smaller models in their activations, and updating these implicit models as new examples appear in the context.

1 code implementation • 21 Nov 2022 • Tianjun Zhang, Xuezhi Wang, Denny Zhou, Dale Schuurmans, Joseph E. Gonzalez

To achieve this, we design a novel action space that allows flexible editing of the initial prompts covering a wide set of commonly-used components like instructions, few-shot exemplars, and verbalizers.

no code implementations • 14 Nov 2022 • Hanjun Dai, Yuan Xue, Niao He, Bethany Wang, Na Li, Dale Schuurmans, Bo Dai

In real-world decision-making, uncertainty is important yet difficult to handle.

1 code implementation • 24 Oct 2022 • Mengjiao Yang, Dale Schuurmans, Pieter Abbeel, Ofir Nachum

While return-conditioning is at the heart of popular algorithms such as decision transformer (DT), these methods tend to perform poorly in highly stochastic environments, where an occasional high return can arise from randomness in the environment rather than the actions themselves.

1 code implementation • 16 Sep 2022 • Haoran Sun, Hanjun Dai, Dale Schuurmans

Optimal scaling has been well studied for Metropolis-Hastings (M-H) algorithms in continuous spaces, but a similar understanding has been lacking in discrete spaces.

no code implementations • 19 Aug 2022 • Tongzheng Ren, Tianjun Zhang, Lisa Lee, Joseph E. Gonzalez, Dale Schuurmans, Bo Dai

Representation learning often plays a critical role in reinforcement learning by managing the curse of dimensionality.

no code implementations • 14 Jul 2022 • Tianjun Zhang, Tongzheng Ren, Mengjiao Yang, Joseph E. Gonzalez, Dale Schuurmans, Bo Dai

It is common to address the curse of dimensionality in Markov decision processes (MDPs) by exploiting low-rank representations.

no code implementations • 2 Jul 2022 • Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Denny Zhou

Recent research has shown that rationales, or step-by-step chains of thought, can be used to improve performance in multi-step reasoning tasks.

no code implementations • 29 Jun 2022 • Haoran Sun, Hanjun Dai, Bo Dai, Haomin Zhou, Dale Schuurmans

It is known that gradient-based MCMC samplers for continuous spaces, such as Langevin Monte Carlo (LMC), can be derived as particle versions of a gradient flow that minimizes KL divergence on a Wasserstein manifold.

no code implementations • 17 Jun 2022 • Ramki Gummadi, Saurabh Kumar, Junfeng Wen, Dale Schuurmans

Approaches to policy optimization have been motivated from diverse principles, based on how the parametric model is interpreted (e. g. value versus policy representation) or how the learning objective is formulated, yet they share a common goal of maximizing expected return.

1 code implementation • 27 May 2022 • Xinyang Geng, Hao liu, Lisa Lee, Dale Schuurmans, Sergey Levine, Pieter Abbeel

We provide an empirical study of M3AE trained on a large-scale image-text dataset, and find that M3AE is able to learn generalizable representations that transfer well to downstream tasks.

1 code implementation • 22 May 2022 • Mengjiao Yang, Dale Schuurmans, Pieter Abbeel, Ofir Nachum

Imitation learning aims to extract high-performance policies from logged demonstrations of expert behavior.

1 code implementation • 21 May 2022 • Denny Zhou, Nathanael Schärli, Le Hou, Jason Wei, Nathan Scales, Xuezhi Wang, Dale Schuurmans, Claire Cui, Olivier Bousquet, Quoc Le, Ed Chi

Chain-of-thought prompting has demonstrated remarkable performance on various natural language reasoning tasks.

Ranked #98 on Arithmetic Reasoning on GSM8K

no code implementations • 25 Apr 2022 • Alex Lewandowski, Calarina Muslimani, Dale Schuurmans, Matthew E. Taylor, Jun Luo

To effectively learn such a teaching policy, we introduce a parametric-behavior embedder that learns a representation of the student's learnable parameters from its input/output behavior.

1 code implementation • 21 Mar 2022 • Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Sharan Narang, Aakanksha Chowdhery, Denny Zhou

Chain-of-thought prompting combined with pre-trained large language models has achieved encouraging results on complex reasoning tasks.

Ranked #80 on Arithmetic Reasoning on GSM8K (using extra training data)

15 code implementations • 28 Jan 2022 • Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, Denny Zhou

We explore how generating a chain of thought -- a series of intermediate reasoning steps -- significantly improves the ability of large language models to perform complex reasoning.

Ranked #37 on Common Sense Reasoning on CommonsenseQA

no code implementations • ICLR 2022 • Hanjun Dai, Yuan Xue, Zia Syed, Dale Schuurmans, Bo Dai

Stochastic dual dynamic programming (SDDP) is a state-of-the-art method for solving multi-stage stochastic optimization, widely used for modeling real-world process optimization tasks.

no code implementations • NeurIPS 2021 • Jincheng Mei, Bo Dai, Chenjun Xiao, Csaba Szepesvari, Dale Schuurmans

We study the effect of stochasticity in on-policy policy optimization, and make the following four contributions.

1 code implementation • 28 Oct 2021 • Hongyu Ren, Hanjun Dai, Bo Dai, Xinyun Chen, Denny Zhou, Jure Leskovec, Dale Schuurmans

There are two important reasoning tasks on KGs: (1) single-hop knowledge graph completion, which involves predicting individual links in the KG; and (2), multi-hop reasoning, where the goal is to predict which KG entities satisfy a given logical query.

no code implementations • 29 Sep 2021 • Alex Lewandowski, Dale Schuurmans, Jun Luo

The resulting environment, while simple, necessitates function approximation for state abstraction and provides ground-truth labels for optimal policies and value functions.

no code implementations • ICLR 2022 • Chenjun Xiao, Bo Dai, Jincheng Mei, Oscar A Ramirez, Ramki Gummadi, Chris Harris, Dale Schuurmans

To better understand the utility of deep models in RL we present an analysis of recursive value estimation using overparameterized linear representations that provides useful, transferable findings.

no code implementations • 29 Sep 2021 • Zichen Zhang, Jun Jin, Martin Jagersand, Jun Luo, Dale Schuurmans

Further, we extend the decentralized approach to sequential decision-making problems where we show in 13 continuous control benchmark environments that it matches or outperforms the state-of-the-art CEM algorithms in most cases, under the same budget of the total number of samples for planning.

2 code implementations • NeurIPS 2021 • Hongyu Ren, Hanjun Dai, Zihang Dai, Mengjiao Yang, Jure Leskovec, Dale Schuurmans, Bo Dai

However, the key limitation of transformers is their quadratic memory and time complexity $\mathcal{O}(L^2)$ with respect to the sequence length in attention layers, which restricts application in extremely long sequences.

Ranked #2 on Language Modelling on Wiki-40B

no code implementations • 18 Jun 2021 • Chenjun Xiao, Ilbin Lee, Bo Dai, Dale Schuurmans, Csaba Szepesvari

In high stake applications, active experimentation may be considered too risky and thus data are often collected passively.

no code implementations • 13 Jun 2021 • Junfeng Wen, Saurabh Kumar, Ramki Gummadi, Dale Schuurmans

Actor-critic (AC) methods are ubiquitous in reinforcement learning.

no code implementations • 13 May 2021 • Jincheng Mei, Yue Gao, Bo Dai, Csaba Szepesvari, Dale Schuurmans

Classical global convergence results for first-order methods rely on uniform smoothness and the \L{}ojasiewicz inequality.

no code implementations • 15 Apr 2021 • Dennis Lee, Natasha Jaques, Chase Kew, Jiaxing Wu, Douglas Eck, Dale Schuurmans, Aleksandra Faust

We then train agents to minimize the difference between the attention weights that they apply to the environment at each timestep, and the attention of other agents.

no code implementations • 6 Apr 2021 • Chenjun Xiao, Yifan Wu, Tor Lattimore, Bo Dai, Jincheng Mei, Lihong Li, Csaba Szepesvari, Dale Schuurmans

First, we introduce a class of confidence-adjusted index algorithms that unifies optimistic and pessimistic principles in a common framework, which enables a general analysis.

no code implementations • 11 Feb 2021 • Nevena Lazić, Botao Hao, Yasin Abbasi-Yadkori, Dale Schuurmans, Csaba Szepesvári

We compare the use of KL divergence as a constraint vs. as a regularizer, and point out several optimization issues with the widely-used constrained approach.

1 code implementation • 12 Dec 2020 • Mengjiao Yang, Bo Dai, Ofir Nachum, George Tucker, Dale Schuurmans

More importantly, we show how the belief distribution estimated by BayesDICE may be used to rank policies with respect to any arbitrary downstream policy selection metric, and we empirically demonstrate that this selection procedure significantly outperforms existing approaches, such as ranking policies according to mean or high-confidence lower bound value estimates.

no code implementations • NeurIPS 2020 • Jincheng Mei, Chenjun Xiao, Bo Dai, Lihong Li, Csaba Szepesvari, Dale Schuurmans

Both findings are based on an analysis of convergence rates using the Non-uniform \L{}ojasiewicz (N\L{}) inequalities.

no code implementations • NeurIPS 2020 • Hanjun Dai, Rishabh Singh, Bo Dai, Charles Sutton, Dale Schuurmans

In this paper we propose ALOE, a new algorithm for learning conditional and unconditional EBMs for discrete structured data, where parameter gradients are estimated using a learned sampler that mimics local search.

no code implementations • NeurIPS 2020 • Bo Dai, Ofir Nachum, Yinlam Chow, Lihong Li, Csaba Szepesvári, Dale Schuurmans

We study high-confidence behavior-agnostic off-policy evaluation in reinforcement learning, where the goal is to estimate a confidence interval on a target policy's value, given only access to a static experience dataset collected by unknown behavior policies.

no code implementations • 29 Sep 2020 • Nan Ding, Xinjie Fan, Zhenzhong Lan, Dale Schuurmans, Radu Soricut

Models based on the Transformer architecture have achieved better accuracy than the ones based on competing architectures for a large set of tasks.

no code implementations • 21 Jul 2020 • Seyed Kamyar Seyed Ghasemipour, Dale Schuurmans, Shixiang Shane Gu

In this work, we closely investigate an important simplification of BCQ -- a prior approach for offline RL -- which removes a heuristic design choice and naturally restricts extracted policies to remain exactly within the support of a given behavior policy.

no code implementations • NeurIPS 2020 • Mengjiao Yang, Ofir Nachum, Bo Dai, Lihong Li, Dale Schuurmans

The recently proposed distribution correction estimation (DICE) family of estimators has advanced the state of the art in off-policy evaluation from behavior-agnostic data.

no code implementations • ICML 2020 • Denny Zhou, Mao Ye, Chen Chen, Tianjian Meng, Mingxing Tan, Xiaodan Song, Quoc Le, Qiang Liu, Dale Schuurmans

This is achieved by layerwise imitation, that is, forcing the thin network to mimic the intermediate outputs of the wide network from layer to layer.

1 code implementation • ICML 2020 • Hanjun Dai, Azade Nazi, Yujia Li, Bo Dai, Dale Schuurmans

Based on this, we develop a novel autoregressive model, named BiGG, that utilizes this sparsity to avoid generating the full adjacency matrix, and importantly reduces the graph generation time complexity to $O((n + m)\log n)$.

no code implementations • NeurIPS 2020 • Nevena Lazic, Dong Yin, Mehrdad Farajtabar, Nir Levine, Dilan Gorur, Chris Harris, Dale Schuurmans

This work focuses on off-policy evaluation (OPE) with function approximation in infinite-horizon undiscounted Markov decision processes (MDPs).

no code implementations • ICML 2020 • Jincheng Mei, Chenjun Xiao, Csaba Szepesvari, Dale Schuurmans

First, we show that with the true gradient, policy gradient with a softmax parametrization converges at a $O(1/t)$ rate, with constants depending on the problem and initialization.

1 code implementation • ICML 2020 • Mengjiao Yang, Bo Dai, Hanjun Dai, Dale Schuurmans

Recently there has been growing interest in modeling sets with exchangeability such as point clouds.

no code implementations • 9 Mar 2020 • Mahdi Karami, Dale Schuurmans

In this paper, we propose a deep probabilistic multi-view model that is composed of a linear multi-view layer based on probabilistic canonical correlation analysis (CCA) description in the latent space together with deep generative networks as observation models.

1 code implementation • ICML 2020 • Junfeng Wen, Bo Dai, Lihong Li, Dale Schuurmans

We consider the problem of approximating the stationary distribution of an ergodic Markov chain given a set of sampled transitions.

1 code implementation • ICML 2020 • Andy Su, Jayden Ooi, Tyler Lu, Dale Schuurmans, Craig Boutilier

Delusional bias is a fundamental source of error in approximate Q-learning.

1 code implementation • ICLR 2020 • Ruiyi Zhang, Bo Dai, Lihong Li, Dale Schuurmans

An important problem that arises in reinforcement learning and Monte Carlo methods is estimating quantities defined by the stationary distribution of a Markov chain.

no code implementations • 24 Dec 2019 • Chenjun Xiao, Yifan Wu, Chen Ma, Dale Schuurmans, Martin Müller

Despite its potential to improve sample complexity versus model-free approaches, model-based reinforcement learning can fail catastrophically if the model is inaccurate.

Model-based Reinforcement Learning
reinforcement-learning
**+1**

no code implementations • 4 Dec 2019 • Ofir Nachum, Bo Dai, Ilya Kostrikov, Yin-Lam Chow, Lihong Li, Dale Schuurmans

In many real-world applications of reinforcement learning (RL), interactions with the environment are limited due to cost or feasibility.

no code implementations • NeurIPS 2019 • Minmin Chen, Ramki Gummadi, Chris Harris, Dale Schuurmans

We investigate batch policy optimization for cost-sensitive classification and contextual bandits---two related tasks that obviate exploration but require generalizing from observed rewards to action selections in unseen contexts.

no code implementations • NeurIPS 2019 • Chenjun Xiao, Ruitong Huang, Jincheng Mei, Dale Schuurmans, Martin Müller

We then extend this approach to general sequential decision making by developing a general MCTS algorithm, Maximum Entropy for Tree Search (MENTS).

1 code implementation • NeurIPS 2019 • Mahdi Karami, Dale Schuurmans, Jascha Sohl-Dickstein, Laurent Dinh, Daniel Duckworth

We show that these transforms allow more effective normalizing flow models to be developed for generative image models.

no code implementations • 25 Sep 2019 • Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi

This paper advocates the use of offline (batch) reinforcement learning (RL) to help (1) isolate the contributions of exploitation vs. exploration in off-policy deep RL, (2) improve reproducibility of deep RL research, and (3) facilitate the design of simpler deep RL algorithms.

no code implementations • ICML 2020 • Junfeng Wen, Russell Greiner, Dale Schuurmans

In many real-world applications, we want to exploit multiple source datasets of similar tasks to learn a model for a different but related target dataset -- e. g., recognizing characters of a new font using a set of different fonts.

1 code implementation • 10 Jul 2019 • Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi

The DQN replay dataset can serve as an offline RL benchmark and is open-sourced.

no code implementations • 29 May 2019 • Martin Mladenov, Ofer Meshi, Jayden Ooi, Dale Schuurmans, Craig Boutilier

Latent-state environments with long horizons, such as those faced by recommender systems, pose significant challenges for reinforcement learning (RL).

1 code implementation • NeurIPS 2019 • Bo Dai, Zhen Liu, Hanjun Dai, Niao He, Arthur Gretton, Le Song, Dale Schuurmans

We present an efficient algorithm for maximum likelihood estimation (MLE) of exponential family models, with a general parametrization of the energy function that includes neural networks.

1 code implementation • 19 Feb 2019 • Rishabh Agarwal, Chen Liang, Dale Schuurmans, Mohammad Norouzi

The parameters of the auxiliary reward function are optimized with respect to the validation performance of a trained policy.

no code implementations • NeurIPS 2019 • Marc G. Bellemare, Will Dabney, Robert Dadashi, Adrien Ali Taiga, Pablo Samuel Castro, Nicolas Le Roux, Dale Schuurmans, Tor Lattimore, Clare Lyle

We leverage this perspective to provide formal evidence regarding the usefulness of value functions as auxiliary tasks.

no code implementations • 31 Jan 2019 • Robert Dadashi, Adrien Ali Taïga, Nicolas Le Roux, Dale Schuurmans, Marc G. Bellemare

We establish geometric and topological properties of the space of value functions in finite state-action Markov decision processes.

no code implementations • NeurIPS 2018 • Tyler Lu, Dale Schuurmans, Craig Boutilier

We identify a fundamental source of error in Q-learning and other forms of dynamic programming with function approximation.

1 code implementation • 27 Nov 2018 • Zafarali Ahmed, Nicolas Le Roux, Mohammad Norouzi, Dale Schuurmans

Entropy regularization is commonly used to improve policy optimization in reinforcement learning.

1 code implementation • 6 Nov 2018 • Bo Dai, Hanjun Dai, Arthur Gretton, Le Song, Dale Schuurmans, Niao He

We investigate penalized maximum log-likelihood estimation for exponential family distributions whose natural parameter resides in a reproducing kernel Hilbert space.

no code implementations • 7 May 2018 • Craig Boutilier, Alon Cohen, Amit Daniely, Avinatan Hassidim, Yishay Mansour, Ofer Meshi, Martin Mladenov, Dale Schuurmans

From an RL perspective, we show that Q-learning with sampled action sets is sound.

no code implementations • 5 Apr 2018 • Aditya Grover, Ramki Gummadi, Miguel Lazaro-Gredilla, Dale Schuurmans, Stefano Ermon

Learning latent variable models with stochastic variational inference is challenging when the approximate posterior is far from the true posterior, due to high variance in the gradient estimates.

no code implementations • ICML 2018 • Ofir Nachum, Mohammad Norouzi, George Tucker, Dale Schuurmans

State-action value functions (i. e., Q-values) are ubiquitous in reinforcement learning (RL), giving rise to popular algorithms such as SARSA and Q-learning.

no code implementations • ICLR 2018 • Ofir Nachum, Mohammad Norouzi, George Tucker, Dale Schuurmans

We propose a new notion of action value defined by a Gaussian smoothed version of the expected Q-value used in SARSA.

no code implementations • NeurIPS 2017 • Mahdi Karami, Martha White, Dale Schuurmans, Csaba Szepesvari

In this paper, we instead reconsider likelihood maximization and develop an optimization based strategy for recovering the latent states and transition parameters.

no code implementations • 30 Nov 2017 • Tyler Lu, Martin Zinkevich, Craig Boutilier, Binz Roy, Dale Schuurmans

Motivated by the cooling of Google's data centers, we study how one can safely identify the parameters of a system model with a desired accuracy and confidence level.

1 code implementation • ICLR 2018 • Ofir Nachum, Mohammad Norouzi, Kelvin Xu, Dale Schuurmans

When evaluated on a number of continuous control tasks, Trust-PCL improves the solution quality and sample efficiency of TRPO.

1 code implementation • NeurIPS 2017 • Ofir Nachum, Mohammad Norouzi, Kelvin Xu, Dale Schuurmans

We establish a new connection between value and policy based reinforcement learning (RL) based on a relationship between softmax temporal value consistency and policy optimality under entropy regularization.

no code implementations • NeurIPS 2016 • Dale Schuurmans, Martin A. Zinkevich

We investigate a reduction of supervised learning to game playing that reveals new connections and learning methods.

no code implementations • 28 Nov 2016 • Ofir Nachum, Mohammad Norouzi, Dale Schuurmans

We propose a more directed exploration strategy that promotes exploration of under-appreciated reward regions.

no code implementations • NeurIPS 2016 • Mohammad Norouzi, Samy Bengio, Zhifeng Chen, Navdeep Jaitly, Mike Schuster, Yonghui Wu, Dale Schuurmans

A key problem in structured output prediction is direct optimization of the task reward function that matters for test evaluation.

no code implementations • 1 Jan 2016 • Siamak Ravanbakhsh, Barnabas Poczos, Jeff Schneider, Dale Schuurmans, Russell Greiner

We propose a Laplace approximation that creates a stochastic unit from any smooth monotonic activation function, using only Gaussian noise.

no code implementations • ICCV 2015 • Xin Li, Yuhong Guo, Dale Schuurmans

Most existing zero-shot learning methods require a user to first provide a set of semantic visual attributes for each class as side information before applying a two-step prediction procedure that introduces an intermediate attribute prediction problem.

no code implementations • NeurIPS 2015 • Farzaneh Mirzazadeh, Siamak Ravanbakhsh, Nan Ding, Dale Schuurmans

A key bottleneck in structured output prediction is the need for inference during training and testing, usually requiring some form of dynamic programming.

1 code implementation • 10 Nov 2015 • Ruitong Huang, Bing Xu, Dale Schuurmans, Csaba Szepesvari

The robustness of neural networks to intended perturbations has recently attracted significant attention.

no code implementations • NeurIPS 2014 • Özlem Aslan, Xinhua Zhang, Dale Schuurmans

Deep learning has been a long standing pursuit in machine learning, which until recently was hampered by unreliable training methods before the discovery of improved heuristics for embedded layer training.

no code implementations • 17 Oct 2014 • Yao-Liang Yu, Xinhua Zhang, Dale Schuurmans

Structured sparsity is an important modeling tool that expands the applicability of convex formulations for data analysis, however it also creates significant challenges for efficient algorithm design.

no code implementations • 13 May 2014 • James Neufeld, András György, Dale Schuurmans, Csaba Szepesvári

We consider the problem of sequentially choosing between a set of unbiased Monte Carlo estimators to minimize the mean-squared-error (MSE) of a final combined estimate.

no code implementations • NeurIPS 2013 • Özlem Aslan, Hao Cheng, Xinhua Zhang, Dale Schuurmans

Latent variable prediction models, such as multi-layer networks, impose auxiliary latent variables between inputs and outputs to allow automatic inference of implicit features useful for prediction.

no code implementations • NeurIPS 2013 • Xinhua Zhang, Yao-Liang Yu, Dale Schuurmans

Structured sparse estimation has become an important technique in many areas of data analysis.

no code implementations • 26 Sep 2013 • Hao Cheng, Xinhua Zhang, Dale Schuurmans

Although many convex relaxations of clustering have been proposed in the past decade, current formulations remain restricted to spherical Gaussian or discriminative models and are susceptible to imbalanced clusters.

no code implementations • NeurIPS 2012 • Martha White, Xinhua Zhang, Dale Schuurmans, Yao-Liang Yu

Subspace learning seeks a low dimensional representation of data that enables accurate reconstruction.

no code implementations • NeurIPS 2012 • Yao-Liang Yu, Özlem Aslan, Dale Schuurmans

Despite the variety of robust regression methods that have been developed, current regression formulations are either NP-hard, or allow unbounded response to even a single leverage point.

no code implementations • NeurIPS 2012 • Xinhua Zhang, Dale Schuurmans, Yao-Liang Yu

Sparse learning models typically combine a smooth loss with a nonsmooth penalty, such as trace norm.

no code implementations • NeurIPS 2010 • Min Yang, Linli Xu, Martha White, Dale Schuurmans, Yao-Liang Yu

We present a generic procedure that can be applied to standard loss functions and demonstrate improved robustness in regression and classification problems.

no code implementations • NeurIPS 2009 • Yao-Liang Yu, Yuxi Li, Dale Schuurmans, Csaba Szepesvári

We prove that linear projections between distribution families with fixed first and second moments are surjective, regardless of dimension.

no code implementations • NeurIPS 2009 • Novi Quadrianto, John Lim, Dale Schuurmans, Tibério S. Caetano

The second is a min-min reformulation consisting of fast alternating steps of closed-form updates.

no code implementations • NeurIPS 2007 • Yuhong Guo, Dale Schuurmans

Most previous studies in active learning have focused on selecting one unlabeled instance at one time while retraining in each iteration.

Cannot find the paper you are looking for? You can
Submit a new open access paper.

Contact us on:
hello@paperswithcode.com
.
Papers With Code is a free resource with all data licensed under CC-BY-SA.