no code implementations • 2 Apr 2024 • Joy Qiping Yang, Salman Salamatian, Ziteng Sun, Ananda Theertha Suresh, Ahmad Beirami
The goal of language model alignment is to alter $p$ to a new distribution $\phi$ that results in a higher expected reward while keeping $\phi$ close to $p.$ A popular alignment method is the KL-constrained reinforcement learning (RL), which chooses a distribution $\phi_\Delta$ that maximizes $E_{\phi_{\Delta}} r(y)$ subject to a relative entropy constraint $KL(\phi_\Delta || p) \leq \Delta.$ Another simple alignment method is best-of-$N$, where $N$ samples are drawn from $p$ and one with highest reward is selected.
no code implementations • 10 Oct 2023 • Davin Choo, Joy Qiping Yang, Arnab Bhattacharyya, Clément L. Canonne
We establish finite-sample guarantees for efficient proper learning of bounded-degree polytrees, a rich class of high-dimensional probability distributions and a subclass of Bayesian networks, a widely-studied type of graphical model.
no code implementations • 13 Apr 2023 • Vipul Arora, Arnab Bhattacharyya, Clément L. Canonne, Joy Qiping Yang
This paper considers the problem of testing the maximum in-degree of the Bayes net underlying an unknown probability distribution $P$ over $\{0, 1\}^n$, given sample access to $P$.
no code implementations • 19 Apr 2022 • Arnab Bhattacharyya, Clément L. Canonne, Joy Qiping Yang
We study the following independence testing problem: given access to samples from a distribution $P$ over $\{0, 1\}^n$, decide whether $P$ is a product distribution or whether it is $\varepsilon$-far in total variation distance from any product distribution.