1 code implementation • 11 Mar 2025 • Seongho Son, William Bankes, Sangwoong Yoon, Shyam Sundhar Ramesh, Xiaohang Tang, Ilija Bogunovic
Test-time alignment of Large Language Models (LLMs) to human preferences offers a flexible way to generate responses aligned to diverse objectives without extensive retraining of LLMs.
1 code implementation • 7 Mar 2025 • Lorenz Wolf, Sangwoong Yoon, Ilija Bogunovic
Mixture of large language model (LLMs) Agents (MoA) architectures achieve state-of-the-art performance on prominent benchmarks like AlpacaEval 2. 0 by leveraging the collaboration of multiple LLMs at inference time.
1 code implementation • 17 Feb 2025 • Petar Steinberg, Juliusz Ziomek, Matej Jusup, Ilija Bogunovic
We address the problem of optimising the average payoff for a large number of cooperating agents, where the payoff function is unknown and treated as a black box.
no code implementations • 9 Jan 2025 • Chong Liu, Dan Qiao, Ming Yin, Ilija Bogunovic, Yu-Xiang Wang
It achieves a near-optimal $O(\sqrt{T})$ regret for problems that the best-known regret is almost linear in time horizon $T$.
2 code implementations • 22 Oct 2024 • Theodore Brown, Alexandru Cioba, Ilija Bogunovic
We derive a bound on the maximum information gain of these invariant kernels, and provide novel upper and lower bounds on the number of observations required for invariance-aware BO algorithms to achieve $\epsilon$-optimality.
no code implementations • 26 Jul 2024 • Seongho Son, William Bankes, Sayak Ray Chowdhury, Brooks Paige, Ilija Bogunovic
We theoretically analyse the convergence of NS-DPO in the offline setting, providing upper bounds on the estimation error caused by non-stationary preferences.
1 code implementation • 25 Jul 2024 • Xiaohang Tang, Afonso Marques, Parameswaran Kamalaruban, Ilija Bogunovic
Decision Transformer (DT), as one of the representative Reinforcement Learning via Supervised Learning (RvS) methods, has achieved strong performance in offline learning tasks by leveraging the powerful Transformer architecture for sequential decision-making.
2 code implementations • 30 May 2024 • Shyam Sundhar Ramesh, Yifan Hu, Iason Chaimalas, Viraj Mehta, Pier Giuseppe Sessa, Haitham Bou Ammar, Ilija Bogunovic
Our approach builds upon reward-free direct preference optimization methods, but unlike previous approaches, it seeks a robust policy which maximizes the worst-case group performance.
no code implementations • 4 Feb 2024 • Chen Feng, Ziquan Liu, Zhuo Zhi, Ilija Bogunovic, Carsten Gerner-Beuerle, Miguel Rodrigues
Our paper offers a new approach to certify the performance of machine learning models in the presence of adversarial attacks with population level risk guarantees.
no code implementations • 1 Dec 2023 • Viraj Mehta, Syrine Belakaria, Vikramjeet Das, Ojash Neopane, Yijia Dai, Ilija Bogunovic, Barbara Engelhardt, Stefano Ermon, Jeff Schneider, Willie Neiswanger
For many applications of preference alignment, the cost of acquiring human feedback can be substantial.
1 code implementation • 1 Dec 2023 • William Bankes, George Hughes, Ilija Bogunovic, Zi Wang
REDUCR reduces the training data while preserving worst-class generalization performance.
no code implementations • 8 Nov 2023 • Wei Wang, Sattar Vakili, Ilija Bogunovic
To this end, we present an instance-dependent lower bound for the robust best-arm identification problem with linear rewards.
no code implementations • 5 Sep 2023 • Shyam Sundhar Ramesh, Pier Giuseppe Sessa, Yifan Hu, Andreas Krause, Ilija Bogunovic
Three major challenges in reinforcement learning are the complex dynamical systems with large state spaces, the costly data acquisition processes, and the deviation of real-world dynamics from the training environment deployment.
1 code implementation • 29 Jun 2023 • Matej Jusup, Barna Pásztor, Tadeusz Janik, Kenan Zhang, Francesco Corman, Andreas Krause, Ilija Bogunovic
Many applications, e. g., in shared mobility, require coordinating a large number of agents.
no code implementations • 8 Feb 2023 • Volodymyr Tkachuk, Seyed Alireza Bakhtiari, Johannes Kirschner, Matej Jusup, Ilija Bogunovic, Csaba Szepesvári
A practical challenge in reinforcement learning are combinatorial action spaces that make planning computationally demanding.
Multi-agent Reinforcement Learning
reinforcement-learning
+2
no code implementations • 19 Dec 2022 • Xiang Li, Viraj Mehta, Johannes Kirschner, Ian Char, Willie Neiswanger, Jeff Schneider, Andreas Krause, Ilija Bogunovic
Many real-world reinforcement learning tasks require control of complex dynamical systems that involve both costly data acquisition processes and large state spaces.
no code implementations • 14 Oct 2022 • Shyam Sundhar Ramesh, Pier Giuseppe Sessa, Andreas Krause, Ilija Bogunovic
Contextual Bayesian optimization (CBO) is a powerful framework for sequential decision-making given side information, with important applications, e. g., in wind energy systems.
no code implementations • 13 Jul 2022 • Parnian Kassraie, Andreas Krause, Ilija Bogunovic
By establishing a novel connection between such kernels and the graph neural tangent kernel (GNTK), we introduce the first GNN confidence bound and use it to design a phased-elimination algorithm with sublinear regret.
no code implementations • 3 Feb 2022 • Ilija Bogunovic, Zihan Li, Andreas Krause, Jonathan Scarlett
We consider the sequential optimization of an unknown, continuous, and expensive to evaluate reward function, from noisy and adversarially corrupted observed rewards.
no code implementations • NeurIPS 2021 • Ilija Bogunovic, Andreas Krause
Instead, we introduce a \emph{misspecified} kernelized bandit setting where the unknown function can be $\epsilon$--uniformly approximated by a function with a bounded norm in some Reproducing Kernel Hilbert Space (RKHS).
1 code implementation • NeurIPS 2021 • Anastasiia Makarova, Ilnura Usmanova, Ilija Bogunovic, Andreas Krause
We generalize BO to trade mean and input-dependent variance of the objective, both of which we assume to be unknown a priori.
no code implementations • NeurIPS 2020 • Pier Giuseppe Sessa, Ilija Bogunovic, Andreas Krause, Maryam Kamgarpour
We formulate the novel class of contextual games, a type of repeated games driven by contextual information at each round.
no code implementations • 8 Jul 2021 • Barna Pásztor, Ilija Bogunovic, Andreas Krause
Learning in multi-agent systems is highly challenging due to several factors including the non-stationarity introduced by agents' interactions and the combinatorial nature of their state and action spaces.
no code implementations • 18 Mar 2021 • Sebastian Curi, Ilija Bogunovic, Andreas Krause
In real-world tasks, reinforcement learning (RL) agents frequently encounter situations that are not present during training time.
Deep Reinforcement Learning
Model-based Reinforcement Learning
+2
no code implementations • 17 Jan 2021 • Jingkang Wang, Mengye Ren, Ilija Bogunovic, Yuwen Xiong, Raquel Urtasun
Recent work on hyperparameters optimization (HPO) has shown the possibility of training certain hyperparameters together with regular parameters.
1 code implementation • NeurIPS 2020 • Pier Giuseppe Sessa, Ilija Bogunovic, Maryam Kamgarpour, Andreas Krause
We consider a repeated sequential game between a learner, who plays first, and an opponent who responds to the chosen action.
no code implementations • 7 Jul 2020 • Ilija Bogunovic, Arpan Losalka, Andreas Krause, Jonathan Scarlett
We consider a stochastic linear bandit problem in which the rewards are not only subject to random noise, but also adversarial attacks subject to a suitable budget $C$ (i. e., an upper bound on the sum of corruption magnitudes across the time horizon).
no code implementations • 4 Mar 2020 • Ilija Bogunovic, Andreas Krause, Jonathan Scarlett
We consider the problem of optimizing an unknown (typically non-convex) function with a bounded norm in some Reproducing Kernel Hilbert Space (RKHS), based on noisy bandit feedback.
no code implementations • 28 Feb 2020 • Pier Giuseppe Sessa, Ilija Bogunovic, Maryam Kamgarpour, Andreas Krause
We consider robust optimization problems, where the goal is to optimize an unknown objective function against the worst-case realization of an uncertain parameter.
no code implementations • 20 Feb 2020 • Johannes Kirschner, Ilija Bogunovic, Stefanie Jegelka, Andreas Krause
Attaining such robustness is the goal of distributionally robust optimization, which seeks a solution to an optimization problem that is worst-case robust under a specified distributional shift of an uncontrolled covariate.
1 code implementation • NeurIPS 2019 • Pier Giuseppe Sessa, Ilija Bogunovic, Maryam Kamgarpour, Andreas Krause
We consider the problem of learning to play a repeated multi-agent game with an unknown reward function.
no code implementations • NeurIPS 2018 • Ilija Bogunovic, Jonathan Scarlett, Stefanie Jegelka, Volkan Cevher
In this paper, we consider the problem of Gaussian process (GP) optimization with an added robustness requirement: The returned point may be perturbed by an adversary, and we require the function value to remain as high as possible even after this perturbation.
no code implementations • 20 Feb 2018 • Ilija Bogunovic, Junyao Zhao, Volkan Cevher
In this work, we present a new algorithm Oblivious-Greedy and prove the first constant-factor approximation guarantees for a wider class of non-submodular objectives.
1 code implementation • 20 Feb 2018 • Paul Rolland, Jonathan Scarlett, Ilija Bogunovic, Volkan Cevher
In this paper, we consider the approach of Kandasamy et al. (2015), in which the high-dimensional function decomposes as a sum of lower-dimensional functions on subsets of the underlying variables.
no code implementations • NeurIPS 2017 • Slobodan Mitrović, Ilija Bogunovic, Ashkan Norouzi-Fard, Jakub Tarnawski, Volkan Cevher
We study the classical problem of maximizing a monotone submodular function subject to a cardinality constraint k, with two additional twists: (i) elements arrive in a streaming fashion, and (ii) m items from the algorithm's memory are removed after the stream is finished.
no code implementations • ICML 2017 • Ilija Bogunovic, Slobodan Mitrović, Jonathan Scarlett, Volkan Cevher
We study the problem of maximizing a monotone submodular function subject to a cardinality constraint $k$, with the added twist that a number of items $\tau$ from the returned set may be removed.
no code implementations • NeurIPS 2016 • Ilija Bogunovic, Jonathan Scarlett, Andreas Krause, Volkan Cevher
We present a new algorithm, truncated variance reduction (TruVaR), that treats Bayesian optimization (BO) and level-set estimation (LSE) with Gaussian processes in a unified fashion.
no code implementations • 25 Jan 2016 • Ilija Bogunovic, Jonathan Scarlett, Volkan Cevher
We illustrate the performance of the algorithms on both synthetic and real data, and we find the gradual forgetting of TV-GP-UCB to perform favorably compared to the sharp resetting of R-GP-UCB.
no code implementations • 21 Oct 2015 • Luca Baldassarre, Yen-Huan Li, Jonathan Scarlett, Baran Gözcü, Ilija Bogunovic, Volkan Cevher
In this paper, we instead take a principled learning-based approach in which a \emph{fixed} index set is chosen based on a set of training signals $\mathbf{x}_1,\dotsc,\mathbf{x}_m$.
no code implementations • 10 Feb 2014 • Adish Singla, Ilija Bogunovic, Gábor Bartók, Amin Karbasi, Andreas Krause
How should we present training examples to learners to teach them classification rules?