1 code implementation • ICML 2020 • Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi
The DQN replay dataset can serve as an offline RL benchmark and is open-sourced.
no code implementations • 7 Mar 2023 • Sherry Yang, Ofir Nachum, Yilun Du, Jason Wei, Pieter Abbeel, Dale Schuurmans
In response to these developments, new paradigms are emerging for training foundation models to interact with other agents and perform long-term reasoning.
no code implementations • 7 Mar 2023 • Azade Nova, Hanjun Dai, Dale Schuurmans
By only using the weights of the pre-trained model and unlabeled data, in a matter of a few minutes on a single GPU, up to 40% of the original FLOP count can be reduced with less than a 4% accuracy loss across all tasks considered.
no code implementations • 31 Jan 2023 • Yilun Du, Mengjiao Yang, Bo Dai, Hanjun Dai, Ofir Nachum, Joshua B. Tenenbaum, Dale Schuurmans, Pieter Abbeel
The proposed policy-as-video formulation can further represent environments with different state and action spaces in a unified space of images, which, for example, enables learning and generalization across a variety of robot manipulation tasks.
no code implementations • 16 Jan 2023 • Jincheng Mei, Wesley Chung, Valentin Thomas, Bo Dai, Csaba Szepesvari, Dale Schuurmans
Instead, the analysis reveals that the primary effect of the value baseline is to \textbf{reduce the aggressiveness of the updates} rather than their variance.
no code implementations • 10 Jan 2023 • Dale Schuurmans
We show that transformer-based large language models are computationally universal when augmented with an external memory.
no code implementations • 17 Dec 2022 • Zichen Zhang, Johannes Kirschner, Junxi Zhang, Francesco Zanini, Alex Ayoub, Masood Dehghan, Dale Schuurmans
Importantly, these two errors behave differently with respect to time discretization, which implies that there is an optimal choice for the temporal resolution that depends on the data budget.
no code implementations • 17 Dec 2022 • Tongzheng Ren, Chenjun Xiao, Tianjun Zhang, Na Li, Zhaoran Wang, Sujay Sanghavi, Dale Schuurmans, Bo Dai
Theoretically, we establish the sample complexity of the proposed approach in the online and offline settings.
Model-based Reinforcement Learning
reinforcement-learning
+1
1 code implementation • 16 Dec 2022 • Zichen Zhang, Jun Jin, Martin Jagersand, Jun Luo, Dale Schuurmans
To tackle this issue, we propose Decentralized CEM (DecentCEM), a simple but effective improvement over classical CEM, by using an ensemble of CEM instances running independently from one another, and each performing a local improvement of its own sampling distribution.
no code implementations • 30 Nov 2022 • Haoran Sun, Lijun Yu, Bo Dai, Dale Schuurmans, Hanjun Dai
Score-based modeling through stochastic differential equations (SDEs) has provided a new perspective on diffusion models, and demonstrated superior performance on continuous data.
no code implementations • 28 Nov 2022 • Ekin Akyürek, Dale Schuurmans, Jacob Andreas, Tengyu Ma, Denny Zhou
We investigate the hypothesis that transformer-based in-context learners implement standard learning algorithms implicitly, by encoding smaller models in their activations, and updating these implicit models as new examples appear in the context.
1 code implementation • 21 Nov 2022 • Tianjun Zhang, Xuezhi Wang, Denny Zhou, Dale Schuurmans, Joseph E. Gonzalez
To achieve this, we design a novel action space that allows flexible editing of the initial prompts covering a wide set of commonly-used components like instructions, few-shot exemplars, and verbalizers.
no code implementations • 14 Nov 2022 • Hanjun Dai, Yuan Xue, Niao He, Bethany Wang, Na Li, Dale Schuurmans, Bo Dai
In real-world decision-making, uncertainty is important yet difficult to handle.
1 code implementation • 24 Oct 2022 • Mengjiao Yang, Dale Schuurmans, Pieter Abbeel, Ofir Nachum
While return-conditioning is at the heart of popular algorithms such as decision transformer (DT), these methods tend to perform poorly in highly stochastic environments, where an occasional high return can arise from randomness in the environment rather than the actions themselves.
1 code implementation • 16 Sep 2022 • Haoran Sun, Hanjun Dai, Dale Schuurmans
Optimal scaling has been well studied for Metropolis-Hastings (M-H) algorithms in continuous spaces, but a similar understanding has been lacking in discrete spaces.
no code implementations • 19 Aug 2022 • Tongzheng Ren, Tianjun Zhang, Lisa Lee, Joseph E. Gonzalez, Dale Schuurmans, Bo Dai
Representation learning often plays a critical role in reinforcement learning by managing the curse of dimensionality.
no code implementations • 14 Jul 2022 • Tianjun Zhang, Tongzheng Ren, Mengjiao Yang, Joseph E. Gonzalez, Dale Schuurmans, Bo Dai
It is common to address the curse of dimensionality in Markov decision processes (MDPs) by exploiting low-rank representations.
no code implementations • 2 Jul 2022 • Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Denny Zhou
Recent research has shown that rationales, or step-by-step chains of thought, can be used to improve performance in multi-step reasoning tasks.
no code implementations • 29 Jun 2022 • Haoran Sun, Hanjun Dai, Bo Dai, Haomin Zhou, Dale Schuurmans
It is known that gradient-based MCMC samplers for continuous spaces, such as Langevin Monte Carlo (LMC), can be derived as particle versions of a gradient flow that minimizes KL divergence on a Wasserstein manifold.
no code implementations • 17 Jun 2022 • Ramki Gummadi, Saurabh Kumar, Junfeng Wen, Dale Schuurmans
Approaches to policy optimization have been motivated from diverse principles, based on how the parametric model is interpreted (e. g. value versus policy representation) or how the learning objective is formulated, yet they share a common goal of maximizing expected return.
1 code implementation • 27 May 2022 • Xinyang Geng, Hao liu, Lisa Lee, Dale Schuurmans, Sergey Levine, Pieter Abbeel
We provide an empirical study of M3AE trained on a large-scale image-text dataset, and find that M3AE is able to learn generalizable representations that transfer well to downstream tasks.
1 code implementation • 22 May 2022 • Mengjiao Yang, Dale Schuurmans, Pieter Abbeel, Ofir Nachum
Imitation learning aims to extract high-performance policies from logged demonstrations of expert behavior.
no code implementations • 21 May 2022 • Denny Zhou, Nathanael Schärli, Le Hou, Jason Wei, Nathan Scales, Xuezhi Wang, Dale Schuurmans, Claire Cui, Olivier Bousquet, Quoc Le, Ed Chi
Although chain-of-thought prompting has shown impressive results on many natural language reasoning tasks, it often performs poorly on tasks which need to solve problems harder than the demonstration examples.
Ranked #14 on
Arithmetic Reasoning
on GSM8K
no code implementations • 25 Apr 2022 • Alex Lewandowski, Calarina Muslimani, Dale Schuurmans, Matthew E. Taylor, Jun Luo
To effectively learn such a teaching policy, we introduce a parametric-behavior embedder that learns a representation of the student's learnable parameters from its input/output behavior.
no code implementations • 21 Mar 2022 • Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Sharan Narang, Aakanksha Chowdhery, Denny Zhou
Chain-of-thought prompting combined with pre-trained large language models has achieved encouraging results on complex reasoning tasks.
Ranked #9 on
Arithmetic Reasoning
on GSM8K
(using extra training data)
no code implementations • 28 Jan 2022 • Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, Denny Zhou
We explore how generating a chain of thought -- a series of intermediate reasoning steps -- significantly improves the ability of large language models to perform complex reasoning.
no code implementations • ICLR 2022 • Hanjun Dai, Yuan Xue, Zia Syed, Dale Schuurmans, Bo Dai
Stochastic dual dynamic programming (SDDP) is a state-of-the-art method for solving multi-stage stochastic optimization, widely used for modeling real-world process optimization tasks.
no code implementations • NeurIPS 2021 • Jincheng Mei, Bo Dai, Chenjun Xiao, Csaba Szepesvari, Dale Schuurmans
We study the effect of stochasticity in on-policy policy optimization, and make the following four contributions.
1 code implementation • 28 Oct 2021 • Hongyu Ren, Hanjun Dai, Bo Dai, Xinyun Chen, Denny Zhou, Jure Leskovec, Dale Schuurmans
There are two important reasoning tasks on KGs: (1) single-hop knowledge graph completion, which involves predicting individual links in the KG; and (2), multi-hop reasoning, where the goal is to predict which KG entities satisfy a given logical query.
no code implementations • 29 Sep 2021 • Zichen Zhang, Jun Jin, Martin Jagersand, Jun Luo, Dale Schuurmans
Further, we extend the decentralized approach to sequential decision-making problems where we show in 13 continuous control benchmark environments that it matches or outperforms the state-of-the-art CEM algorithms in most cases, under the same budget of the total number of samples for planning.
no code implementations • 29 Sep 2021 • Alex Lewandowski, Dale Schuurmans, Jun Luo
The resulting environment, while simple, necessitates function approximation for state abstraction and provides ground-truth labels for optimal policies and value functions.
no code implementations • ICLR 2022 • Chenjun Xiao, Bo Dai, Jincheng Mei, Oscar A Ramirez, Ramki Gummadi, Chris Harris, Dale Schuurmans
To better understand the utility of deep models in RL we present an analysis of recursive value estimation using overparameterized linear representations that provides useful, transferable findings.
1 code implementation • NeurIPS 2021 • Hongyu Ren, Hanjun Dai, Zihang Dai, Mengjiao Yang, Jure Leskovec, Dale Schuurmans, Bo Dai
However, the key limitation of transformers is their quadratic memory and time complexity $\mathcal{O}(L^2)$ with respect to the sequence length in attention layers, which restricts application in extremely long sequences.
Ranked #2 on
Language Modelling
on Wiki-40B
no code implementations • 18 Jun 2021 • Chenjun Xiao, Ilbin Lee, Bo Dai, Dale Schuurmans, Csaba Szepesvari
In high stake applications, active experimentation may be considered too risky and thus data are often collected passively.
no code implementations • 13 Jun 2021 • Junfeng Wen, Saurabh Kumar, Ramki Gummadi, Dale Schuurmans
Actor-critic (AC) methods are ubiquitous in reinforcement learning.
no code implementations • 13 May 2021 • Jincheng Mei, Yue Gao, Bo Dai, Csaba Szepesvari, Dale Schuurmans
Classical global convergence results for first-order methods rely on uniform smoothness and the \L{}ojasiewicz inequality.
no code implementations • 15 Apr 2021 • Dennis Lee, Natasha Jaques, Chase Kew, Jiaxing Wu, Douglas Eck, Dale Schuurmans, Aleksandra Faust
We then train agents to minimize the difference between the attention weights that they apply to the environment at each timestep, and the attention of other agents.
no code implementations • 6 Apr 2021 • Chenjun Xiao, Yifan Wu, Tor Lattimore, Bo Dai, Jincheng Mei, Lihong Li, Csaba Szepesvari, Dale Schuurmans
First, we introduce a class of confidence-adjusted index algorithms that unifies optimistic and pessimistic principles in a common framework, which enables a general analysis.
no code implementations • 11 Feb 2021 • Nevena Lazić, Botao Hao, Yasin Abbasi-Yadkori, Dale Schuurmans, Csaba Szepesvári
We compare the use of KL divergence as a constraint vs. as a regularizer, and point out several optimization issues with the widely-used constrained approach.
1 code implementation • 12 Dec 2020 • Mengjiao Yang, Bo Dai, Ofir Nachum, George Tucker, Dale Schuurmans
More importantly, we show how the belief distribution estimated by BayesDICE may be used to rank policies with respect to any arbitrary downstream policy selection metric, and we empirically demonstrate that this selection procedure significantly outperforms existing approaches, such as ranking policies according to mean or high-confidence lower bound value estimates.
no code implementations • NeurIPS 2020 • Jincheng Mei, Chenjun Xiao, Bo Dai, Lihong Li, Csaba Szepesvari, Dale Schuurmans
Both findings are based on an analysis of convergence rates using the Non-uniform \L{}ojasiewicz (N\L{}) inequalities.
no code implementations • NeurIPS 2020 • Hanjun Dai, Rishabh Singh, Bo Dai, Charles Sutton, Dale Schuurmans
In this paper we propose ALOE, a new algorithm for learning conditional and unconditional EBMs for discrete structured data, where parameter gradients are estimated using a learned sampler that mimics local search.
no code implementations • NeurIPS 2020 • Bo Dai, Ofir Nachum, Yinlam Chow, Lihong Li, Csaba Szepesvári, Dale Schuurmans
We study high-confidence behavior-agnostic off-policy evaluation in reinforcement learning, where the goal is to estimate a confidence interval on a target policy's value, given only access to a static experience dataset collected by unknown behavior policies.
no code implementations • 29 Sep 2020 • Nan Ding, Xinjie Fan, Zhenzhong Lan, Dale Schuurmans, Radu Soricut
Models based on the Transformer architecture have achieved better accuracy than the ones based on competing architectures for a large set of tasks.
no code implementations • 21 Jul 2020 • Seyed Kamyar Seyed Ghasemipour, Dale Schuurmans, Shixiang Shane Gu
In this work, we closely investigate an important simplification of BCQ -- a prior approach for offline RL -- which removes a heuristic design choice and naturally restricts extracted policies to remain exactly within the support of a given behavior policy.
no code implementations • NeurIPS 2020 • Mengjiao Yang, Ofir Nachum, Bo Dai, Lihong Li, Dale Schuurmans
The recently proposed distribution correction estimation (DICE) family of estimators has advanced the state of the art in off-policy evaluation from behavior-agnostic data.
no code implementations • ICML 2020 • Denny Zhou, Mao Ye, Chen Chen, Tianjian Meng, Mingxing Tan, Xiaodan Song, Quoc Le, Qiang Liu, Dale Schuurmans
This is achieved by layerwise imitation, that is, forcing the thin network to mimic the intermediate outputs of the wide network from layer to layer.
1 code implementation • ICML 2020 • Hanjun Dai, Azade Nazi, Yujia Li, Bo Dai, Dale Schuurmans
Based on this, we develop a novel autoregressive model, named BiGG, that utilizes this sparsity to avoid generating the full adjacency matrix, and importantly reduces the graph generation time complexity to $O((n + m)\log n)$.
no code implementations • NeurIPS 2020 • Nevena Lazic, Dong Yin, Mehrdad Farajtabar, Nir Levine, Dilan Gorur, Chris Harris, Dale Schuurmans
This work focuses on off-policy evaluation (OPE) with function approximation in infinite-horizon undiscounted Markov decision processes (MDPs).
no code implementations • ICML 2020 • Jincheng Mei, Chenjun Xiao, Csaba Szepesvari, Dale Schuurmans
First, we show that with the true gradient, policy gradient with a softmax parametrization converges at a $O(1/t)$ rate, with constants depending on the problem and initialization.
1 code implementation • ICML 2020 • Mengjiao Yang, Bo Dai, Hanjun Dai, Dale Schuurmans
Recently there has been growing interest in modeling sets with exchangeability such as point clouds.
no code implementations • 9 Mar 2020 • Mahdi Karami, Dale Schuurmans
In this paper, we propose a deep probabilistic multi-view model that is composed of a linear multi-view layer based on probabilistic canonical correlation analysis (CCA) description in the latent space together with deep generative networks as observation models.
1 code implementation • ICML 2020 • Junfeng Wen, Bo Dai, Lihong Li, Dale Schuurmans
We consider the problem of approximating the stationary distribution of an ergodic Markov chain given a set of sampled transitions.
no code implementations • ICML 2020 • Andy Su, Jayden Ooi, Tyler Lu, Dale Schuurmans, Craig Boutilier
Delusional bias is a fundamental source of error in approximate Q-learning.
1 code implementation • ICLR 2020 • Ruiyi Zhang, Bo Dai, Lihong Li, Dale Schuurmans
An important problem that arises in reinforcement learning and Monte Carlo methods is estimating quantities defined by the stationary distribution of a Markov chain.
no code implementations • 24 Dec 2019 • Chenjun Xiao, Yifan Wu, Chen Ma, Dale Schuurmans, Martin Müller
Despite its potential to improve sample complexity versus model-free approaches, model-based reinforcement learning can fail catastrophically if the model is inaccurate.
Model-based Reinforcement Learning
reinforcement-learning
+1
no code implementations • 4 Dec 2019 • Ofir Nachum, Bo Dai, Ilya Kostrikov, Yin-Lam Chow, Lihong Li, Dale Schuurmans
In many real-world applications of reinforcement learning (RL), interactions with the environment are limited due to cost or feasibility.
no code implementations • NeurIPS 2019 • Minmin Chen, Ramki Gummadi, Chris Harris, Dale Schuurmans
We investigate batch policy optimization for cost-sensitive classification and contextual bandits---two related tasks that obviate exploration but require generalizing from observed rewards to action selections in unseen contexts.
no code implementations • NeurIPS 2019 • Chenjun Xiao, Ruitong Huang, Jincheng Mei, Dale Schuurmans, Martin Müller
We then extend this approach to general sequential decision making by developing a general MCTS algorithm, Maximum Entropy for Tree Search (MENTS).
1 code implementation • NeurIPS 2019 • Mahdi Karami, Dale Schuurmans, Jascha Sohl-Dickstein, Laurent Dinh, Daniel Duckworth
We show that these transforms allow more effective normalizing flow models to be developed for generative image models.
no code implementations • 25 Sep 2019 • Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi
This paper advocates the use of offline (batch) reinforcement learning (RL) to help (1) isolate the contributions of exploitation vs. exploration in off-policy deep RL, (2) improve reproducibility of deep RL research, and (3) facilitate the design of simpler deep RL algorithms.
no code implementations • ICML 2020 • Junfeng Wen, Russell Greiner, Dale Schuurmans
In many real-world applications, we want to exploit multiple source datasets of similar tasks to learn a model for a different but related target dataset -- e. g., recognizing characters of a new font using a set of different fonts.
2 code implementations • 10 Jul 2019 • Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi
The DQN replay dataset can serve as an offline RL benchmark and is open-sourced.
no code implementations • 29 May 2019 • Martin Mladenov, Ofer Meshi, Jayden Ooi, Dale Schuurmans, Craig Boutilier
Latent-state environments with long horizons, such as those faced by recommender systems, pose significant challenges for reinforcement learning (RL).
1 code implementation • NeurIPS 2019 • Bo Dai, Zhen Liu, Hanjun Dai, Niao He, Arthur Gretton, Le Song, Dale Schuurmans
We present an efficient algorithm for maximum likelihood estimation (MLE) of exponential family models, with a general parametrization of the energy function that includes neural networks.
1 code implementation • 19 Feb 2019 • Rishabh Agarwal, Chen Liang, Dale Schuurmans, Mohammad Norouzi
The parameters of the auxiliary reward function are optimized with respect to the validation performance of a trained policy.
no code implementations • 31 Jan 2019 • Robert Dadashi, Adrien Ali Taïga, Nicolas Le Roux, Dale Schuurmans, Marc G. Bellemare
We establish geometric and topological properties of the space of value functions in finite state-action Markov decision processes.
no code implementations • NeurIPS 2019 • Marc G. Bellemare, Will Dabney, Robert Dadashi, Adrien Ali Taiga, Pablo Samuel Castro, Nicolas Le Roux, Dale Schuurmans, Tor Lattimore, Clare Lyle
We leverage this perspective to provide formal evidence regarding the usefulness of value functions as auxiliary tasks.
no code implementations • NeurIPS 2018 • Tyler Lu, Dale Schuurmans, Craig Boutilier
We identify a fundamental source of error in Q-learning and other forms of dynamic programming with function approximation.
1 code implementation • 27 Nov 2018 • Zafarali Ahmed, Nicolas Le Roux, Mohammad Norouzi, Dale Schuurmans
Entropy regularization is commonly used to improve policy optimization in reinforcement learning.
1 code implementation • 6 Nov 2018 • Bo Dai, Hanjun Dai, Arthur Gretton, Le Song, Dale Schuurmans, Niao He
We investigate penalized maximum log-likelihood estimation for exponential family distributions whose natural parameter resides in a reproducing kernel Hilbert space.
no code implementations • 7 May 2018 • Craig Boutilier, Alon Cohen, Amit Daniely, Avinatan Hassidim, Yishay Mansour, Ofer Meshi, Martin Mladenov, Dale Schuurmans
From an RL perspective, we show that Q-learning with sampled action sets is sound.
no code implementations • 5 Apr 2018 • Aditya Grover, Ramki Gummadi, Miguel Lazaro-Gredilla, Dale Schuurmans, Stefano Ermon
Learning latent variable models with stochastic variational inference is challenging when the approximate posterior is far from the true posterior, due to high variance in the gradient estimates.
no code implementations • ICML 2018 • Ofir Nachum, Mohammad Norouzi, George Tucker, Dale Schuurmans
State-action value functions (i. e., Q-values) are ubiquitous in reinforcement learning (RL), giving rise to popular algorithms such as SARSA and Q-learning.
no code implementations • ICLR 2018 • Ofir Nachum, Mohammad Norouzi, George Tucker, Dale Schuurmans
We propose a new notion of action value defined by a Gaussian smoothed version of the expected Q-value used in SARSA.
no code implementations • NeurIPS 2017 • Mahdi Karami, Martha White, Dale Schuurmans, Csaba Szepesvari
In this paper, we instead reconsider likelihood maximization and develop an optimization based strategy for recovering the latent states and transition parameters.
no code implementations • 30 Nov 2017 • Tyler Lu, Martin Zinkevich, Craig Boutilier, Binz Roy, Dale Schuurmans
Motivated by the cooling of Google's data centers, we study how one can safely identify the parameters of a system model with a desired accuracy and confidence level.
1 code implementation • ICLR 2018 • Ofir Nachum, Mohammad Norouzi, Kelvin Xu, Dale Schuurmans
When evaluated on a number of continuous control tasks, Trust-PCL improves the solution quality and sample efficiency of TRPO.
1 code implementation • NeurIPS 2017 • Ofir Nachum, Mohammad Norouzi, Kelvin Xu, Dale Schuurmans
We establish a new connection between value and policy based reinforcement learning (RL) based on a relationship between softmax temporal value consistency and policy optimality under entropy regularization.
no code implementations • NeurIPS 2016 • Dale Schuurmans, Martin A. Zinkevich
We investigate a reduction of supervised learning to game playing that reveals new connections and learning methods.
no code implementations • 28 Nov 2016 • Ofir Nachum, Mohammad Norouzi, Dale Schuurmans
We propose a more directed exploration strategy that promotes exploration of under-appreciated reward regions.
no code implementations • NeurIPS 2016 • Mohammad Norouzi, Samy Bengio, Zhifeng Chen, Navdeep Jaitly, Mike Schuster, Yonghui Wu, Dale Schuurmans
A key problem in structured output prediction is direct optimization of the task reward function that matters for test evaluation.
no code implementations • 1 Jan 2016 • Siamak Ravanbakhsh, Barnabas Poczos, Jeff Schneider, Dale Schuurmans, Russell Greiner
We propose a Laplace approximation that creates a stochastic unit from any smooth monotonic activation function, using only Gaussian noise.
no code implementations • NeurIPS 2015 • Farzaneh Mirzazadeh, Siamak Ravanbakhsh, Nan Ding, Dale Schuurmans
A key bottleneck in structured output prediction is the need for inference during training and testing, usually requiring some form of dynamic programming.
no code implementations • ICCV 2015 • Xin Li, Yuhong Guo, Dale Schuurmans
Most existing zero-shot learning methods require a user to first provide a set of semantic visual attributes for each class as side information before applying a two-step prediction procedure that introduces an intermediate attribute prediction problem.
1 code implementation • 10 Nov 2015 • Ruitong Huang, Bing Xu, Dale Schuurmans, Csaba Szepesvari
The robustness of neural networks to intended perturbations has recently attracted significant attention.
no code implementations • NeurIPS 2014 • Özlem Aslan, Xinhua Zhang, Dale Schuurmans
Deep learning has been a long standing pursuit in machine learning, which until recently was hampered by unreliable training methods before the discovery of improved heuristics for embedded layer training.
no code implementations • 17 Oct 2014 • Yao-Liang Yu, Xinhua Zhang, Dale Schuurmans
Structured sparsity is an important modeling tool that expands the applicability of convex formulations for data analysis, however it also creates significant challenges for efficient algorithm design.
no code implementations • 13 May 2014 • James Neufeld, András György, Dale Schuurmans, Csaba Szepesvári
We consider the problem of sequentially choosing between a set of unbiased Monte Carlo estimators to minimize the mean-squared-error (MSE) of a final combined estimate.
no code implementations • NeurIPS 2013 • Xinhua Zhang, Yao-Liang Yu, Dale Schuurmans
Structured sparse estimation has become an important technique in many areas of data analysis.
no code implementations • NeurIPS 2013 • Özlem Aslan, Hao Cheng, Xinhua Zhang, Dale Schuurmans
Latent variable prediction models, such as multi-layer networks, impose auxiliary latent variables between inputs and outputs to allow automatic inference of implicit features useful for prediction.
no code implementations • 26 Sep 2013 • Hao Cheng, Xinhua Zhang, Dale Schuurmans
Although many convex relaxations of clustering have been proposed in the past decade, current formulations remain restricted to spherical Gaussian or discriminative models and are susceptible to imbalanced clusters.
no code implementations • NeurIPS 2012 • Xinhua Zhang, Dale Schuurmans, Yao-Liang Yu
Sparse learning models typically combine a smooth loss with a nonsmooth penalty, such as trace norm.
no code implementations • NeurIPS 2012 • Martha White, Xinhua Zhang, Dale Schuurmans, Yao-Liang Yu
Subspace learning seeks a low dimensional representation of data that enables accurate reconstruction.
no code implementations • NeurIPS 2012 • Yao-Liang Yu, Özlem Aslan, Dale Schuurmans
Despite the variety of robust regression methods that have been developed, current regression formulations are either NP-hard, or allow unbounded response to even a single leverage point.
no code implementations • NeurIPS 2010 • Min Yang, Linli Xu, Martha White, Dale Schuurmans, Yao-Liang Yu
We present a generic procedure that can be applied to standard loss functions and demonstrate improved robustness in regression and classification problems.
no code implementations • NeurIPS 2009 • Novi Quadrianto, John Lim, Dale Schuurmans, Tibério S. Caetano
The second is a min-min reformulation consisting of fast alternating steps of closed-form updates.
no code implementations • NeurIPS 2009 • Yao-Liang Yu, Yuxi Li, Dale Schuurmans, Csaba Szepesvári
We prove that linear projections between distribution families with fixed first and second moments are surjective, regardless of dimension.
no code implementations • NeurIPS 2007 • Yuhong Guo, Dale Schuurmans
Most previous studies in active learning have focused on selecting one unlabeled instance at one time while retraining in each iteration.