We consider Bayesian optimization of the output of a network of functions, where each function takes as input the output of its parent nodes, and where the network takes significant time to evaluate.
Recent advances in computationally efficient non-myopic Bayesian optimization (BO) improve query efficiency over traditional myopic methods like expected improvement while only modestly increasing computational cost.
To overcome the shortcomings of existing approaches, we propose the budgeted multi-step expected improvement, a non-myopic acquisition function that generalizes classical expected improvement to the setting of heterogeneous and unknown evaluation costs.
Thus, there is substantial value in understanding the performance of index policies and other policies that can be computed efficiently for large $N$.
We consider optimization of composite objective functions, i. e., of the form $f(x)=g(h(x))$, where $h$ is a black-box derivative-free expensive-to-evaluate function with vector-valued outputs, and $g$ is a cheap-to-evaluate real-valued function.
Nonetheless, for hyperparameter tuning in deep neural networks, the time required to evaluate the validation error for even a few hyperparameter settings remains a bottleneck.
It builds a surrogate for the objective and quantifies the uncertainty in that surrogate using a Bayesian machine learning technique, Gaussian process regression, and then uses an acquisition function defined from this surrogate to decide where to sample.
While Bayesian optimization (BO) has achieved great success in optimizing expensive-to-evaluate black-box functions, especially tuning hyperparameters of neural networks, methods such as random search (Li et al., 2016) and multi-fidelity BO (e. g. Klein et al. (2017)) that exploit cheap approximations, e. g. training on a smaller training data or with fewer iterations, can outperform standard BO approaches that use only full-fidelity observations.
This paper studies Bayesian ranking and selection (R&S) problems with correlated prior beliefs and continuous domains, i. e. Bayesian optimization (BO).
Under certain noise assumptions, we show that the Bayes-optimal policy for maximally reducing entropy of the posterior distribution of this linear classifier is a greedy policy, and that this policy achieves a linear lower bound when alternatives can be constructed from the continuum.
The probabilistic bisection algorithm (PBA) solves a class of stochastic root-finding problems in one dimension by successively updating a prior belief on the location of the root based on noisy responses to queries at chosen points.
We develop a framework for warm-starting Bayesian optimization, that reduces the solution time required to solve an optimization problem that is one in a sequence of related problems.
In many applications of black-box optimization, one can evaluate multiple points simultaneously, e. g. when evaluating the performances of several different neural network architectures in a parallel computing environment.
We present a Bayesian sequential decision-making formulation of the information filtering problem, in which an algorithm presents items (news articles, scientific papers, tweets) arriving in a stream, and learns relevance from user feedback on presented items.
We also show that the resulting one-step Bayes optimal algorithm for parallel global optimization finds high-quality solutions with fewer evaluations than a heuristic based on approximately maximizing the q-EI.
We consider derivative-free black-box global optimization of expensive noisy functions, when most of the randomness in the objective is produced by a few influential scalar random inputs.
We consider effort allocation in crowdsourcing, where we wish to assign labeling tasks to imperfect homogeneous crowd workers to maximize overall accuracy in a continuous-time Bayesian setting, subject to budget and time constraints.
Much of the data being created on the web contains interactions between users and items.
We focus on the cold-start setting for this problem, in which we have limited historical data on the user's preferences, and must rely on feedback from forwarded articles to learn which the fraction of items relevant to the user in each of several item categories.
We consider the problem of group testing with sum observations and noiseless answers, in which we aim to locate multiple objects by querying the number of objects in each of a sequence of chosen sets.
Approximate dynamic programming (ADP) has proven itself in a wide range of applications spanning large-scale transportation problems, health care, revenue management, and energy systems.