1 code implementation • 4 Feb 2025 • Hao Sun, Yunyi Shen, Jean-Francois Ton, Mihaela van der Schaar
We advocate for using embedding-based input in reward model research as an accelerated solution to those challenges.
no code implementations • 18 Nov 2024 • Jean-Francois Ton, Muhammad Faaiz Taufiq, Yang Liu
Large Language Models (LLMs) have shown impressive performance in complex reasoning tasks through Chain-of-Thought (CoT) reasoning, allowing models to break down problems into manageable sub-tasks.
1 code implementation • 7 Nov 2024 • Hao Sun, Yunyi Shen, Jean-Francois Ton
The Bradley-Terry (BT) model is a common and successful practice in reward modeling for Large Language Model (LLM) alignment.
no code implementations • 30 Oct 2024 • Andrew Estornell, Jean-Francois Ton, Yuanshun Yao, Yang Liu
Large language models (LLMs) have demonstrated a remarkable ability to serve as general-purpose tools for various language-based tasks.
no code implementations • 8 Mar 2024 • Xiaoying Zhang, Jean-Francois Ton, Wei Shen, Hongning Wang, Yang Liu
We introduce Adversarial Policy Optimization (AdvPO), a novel solution to the pervasive issue of reward over-optimization in Reinforcement Learning from Human Feedback (RLHF) for Large Language Models (LLMs).
1 code implementation • 27 Feb 2024 • Muhammad Faaiz Taufiq, Jean-Francois Ton, Yang Liu
In machine learning fairness, training models that minimize disparity across different sensitive groups often leads to diminished accuracy, a phenomenon known as the fairness-accuracy trade-off.
no code implementations • 16 Feb 2024 • Jiaheng Wei, Yuanshun Yao, Jean-Francois Ton, Hongyi Guo, Andrew Estornell, Yang Liu
FEWL leverages the answers from off-the-shelf LLMs that serve as a proxy of gold-standard answers.
1 code implementation • NeurIPS 2023 • Muhammad Faaiz Taufiq, Arnaud Doucet, Rob Cornish, Jean-Francois Ton
Off-Policy Evaluation (OPE) in contextual bandits is crucial for assessing new policies using existing data without costly experimentation.
no code implementations • 9 Oct 2023 • Yegor Klochkov, Jean-Francois Ton, Ruocheng Guo, Yang Liu, Hang Li
We address the problem of concept removal in deep neural networks, aiming to learn representations that do not encode certain specified concepts (e. g., gender etc.)
1 code implementation • NeurIPS 2023 • Mengyue Yang, Zhen Fang, Yonggang Zhang, Yali Du, Furui Liu, Jean-Francois Ton, Jianhong Wang, Jun Wang
To capture the information of sufficient and necessary causes, we employ a classical concept, the probability of sufficiency and necessary causes (PNS), which indicates the probability of whether one is the necessary and sufficient cause.
1 code implementation • 10 Aug 2023 • Yang Liu, Yuanshun Yao, Jean-Francois Ton, Xiaoying Zhang, Ruocheng Guo, Hao Cheng, Yegor Klochkov, Muhammad Faaiz Taufiq, Hang Li
However, a major challenge faced by practitioners is the lack of clear guidance on evaluating whether LLM outputs align with social norms, values, and regulations.
no code implementations • 9 Jun 2022 • Muhammad Faaiz Taufiq, Jean-Francois Ton, Rob Cornish, Yee Whye Teh, Arnaud Doucet
Most off-policy evaluation methods for contextual bandits have focused on the expected outcome of a policy, which is estimated via methods that at best provide only asymptotic guarantees.
no code implementations • NAACL (ACL) 2022 • Jean-Francois Ton, Walter Talbott, Shuangfei Zhai, Josh Susskind
In particular, we find that the added L2 regularization seems to improve the performance for high-frequency words without deteriorating the performance for low frequency ones.
no code implementations • 6 Jul 2020 • Jean-Francois Ton, Dino Sejdinovic, Kenji Fukumizu
Based on recent developments in meta learning as well as in causal inference, we introduce a novel generative model that allows distinguishing cause and effect in the small data setting.
no code implementations • ICLR 2021 • Soufiane Hayou, Jean-Francois Ton, Arnaud Doucet, Yee Whye Teh
Overparameterized Neural Networks (NN) display state-of-the-art performance.
1 code implementation • ICML 2020 • Jin Xu, Jean-Francois Ton, Hyunjik Kim, Adam R. Kosiorek, Yee Whye Teh
We develop a functional encoder-decoder approach to supervised meta-learning, where labeled data is encoded into an infinite-dimensional functional representation rather than a finite-dimensional one.
no code implementations • 5 Jun 2019 • Jean-Francois Ton, Lucian Chan, Yee Whye Teh, Dino Sejdinovic
Current meta-learning approaches focus on learning functional representations of relationships between variables, i. e. on estimating conditional expectations in regression.
no code implementations • 26 Feb 2019 • Henry Chai, Jean-Francois Ton, Roman Garnett, Michael A. Osborne
We present a novel technique for tailoring Bayesian quadrature (BQ) to model selection.
no code implementations • 24 Jun 2018 • Zhu Li, Jean-Francois Ton, Dino Oglic, Dino Sejdinovic
We study both the standard random Fourier features method for which we improve the existing bounds on the number of features required to guarantee the corresponding minimax risk convergence rate of kernel ridge regression, as well as a data-dependent modification which samples features proportional to \emph{ridge leverage scores} and further reduces the required number of features.
no code implementations • 15 Nov 2017 • Jean-Francois Ton, Seth Flaxman, Dino Sejdinovic, Samir Bhatt
The use of covariance kernels is ubiquitous in the field of spatial statistics.