no code implementations • 24 Jan 2024 • Ke Ye, Heinrich Jiang, Afshin Rostamizadeh, Ayan Chakrabarti, Giulia Desalvo, Jean-François Kagy, Lazaros Karydas, Gui Citovsky, Sanjiv Kumar
In this paper, we present SpacTor, a new training procedure consisting of (1) a hybrid objective combining span corruption (SC) and token replacement detection (RTD), and (2) a two-stage curriculum that optimizes the hybrid objective over the initial $\tau$ iterations, then transitions to standard SC loss.
no code implementations • 7 Oct 2022 • Dara Bahri, Heinrich Jiang, Tal Schuster, Afshin Rostamizadeh
Given a labeled training set and a collection of unlabeled data, the goal of active learning (AL) is to identify the best unlabeled points to label.
no code implementations • 15 Feb 2022 • Taman Narayan, Heinrich Jiang, Sen Zhao, Sanjiv Kumar
Much effort has been devoted to making large and more accurate models, but relatively little has been put into understanding which examples are benefiting from the added complexity.
no code implementations • 29 Sep 2021 • Yaodong Yu, Heinrich Jiang, Dara Bahri, Hossein Mobahi, Seungyeon Kim, Ankit Singh Rawat, Andreas Veit, Yi Ma
Concretely, we show that larger models and larger datasets need to be simultaneously leveraged to improve OOD performance.
no code implementations • ICLR 2022 • Dara Bahri, Heinrich Jiang, Yi Tay, Donald Metzler
Self-supervised contrastive representation learning has proved incredibly successful in the vision and natural language domains, enabling state-of-the-art performance with orders of magnitude less labeled data.
no code implementations • 4 Jun 2021 • Heinrich Jiang, Afshin Rostamizadeh
We show under standard non-parametric assumptions that a classical support estimator can be repurposed as an offline algorithm attaining an excess query cost of $\widetilde{\Theta}(n^{D/(D+1)})$ compared to the optimal learner, where $n$ is the number of datapoints and $D$ is the dimension.
no code implementations • ICLR 2022 • Heinrich Jiang, Harikrishna Narasimhan, Dara Bahri, Andrew Cotter, Afshin Rostamizadeh
In real-world systems, models are frequently updated as more data becomes available, and in addition to achieving high accuracy, the goal is to also maintain a low difference in predictions compared to the base model (i. e. predictive "churn").
no code implementations • CVPR 2021 • Jennifer Jang, Heinrich Jiang
The runtime is linear in the number of points and exponential in dimension, which makes MeanShift++ ideal on low-dimensional applications such as image segmentation and object tracking.
no code implementations • 9 Feb 2021 • Dara Bahri, Heinrich Jiang, Yi Tay, Donald Metzler
Detecting out-of-distribution (OOD) examples is critical in many applications.
Out-of-Distribution Detection Out of Distribution (OOD) Detection
no code implementations • 9 Feb 2021 • Dara Bahri, Heinrich Jiang
Training modern neural networks is an inherently noisy process that can lead to high \emph{prediction churn} -- disagreements between re-trainings of the same model due to factors such as randomization in the parameter initialization and mini-batches -- even when the trained models all attain similar accuracies.
no code implementations • 1 Jan 2021 • Dara Bahri, Heinrich Jiang
Training modern neural networks is an inherently noisy process that can lead to high \emph{prediction churn}-- disagreements between re-trainings of the same model due to factors such as randomization in the parameter initialization and mini-batches-- even when the trained models all attain high accuracies.
no code implementations • 17 Jun 2020 • Aldo Pacchiano, Mohammad Ghavamzadeh, Peter Bartlett, Heinrich Jiang
We propose an upper-confidence bound algorithm for this problem, called optimistic pessimistic linear bandit (OPLB), and prove an $\widetilde{\mathcal{O}}(\frac{d\sqrt{T}}{\tau-c_0})$ bound on its $T$-round regret, where the denominator is the difference between the constraint threshold and the cost of a known feasible action.
no code implementations • NeurIPS 2020 • Heinrich Jiang, Jennifer Jang, Jakub Łącki
DBSCAN is a popular density-based clustering algorithm.
no code implementations • 8 Jun 2020 • Heinrich Jiang, Qijia Jiang, Aldo Pacchiano
Learning under one-sided feedback (i. e., where we only observe the labels for examples we predicted positively on) is a fundamental problem in machine learning -- applications include lending and recommendation systems.
no code implementations • ICML 2020 • Dara Bahri, Heinrich Jiang, Maya Gupta
Modern machine learning models are often trained on examples with noisy labels that hurt performance and are hard to identify.
no code implementations • 5 Mar 2020 • Aldo Pacchiano, Heinrich Jiang, Michael. I. Jordan
Mode estimation is a classical problem in statistics with a wide range of applications in machine learning.
no code implementations • 4 Oct 2019 • Ofir Nachum, Heinrich Jiang
A number of machine learning (ML) methods have been proposed recently to maximize model predictive accuracy while enforcing notions of group parity or fairness across sub-populations.
1 code implementation • 28 Jul 2019 • Ray Jiang, Aldo Pacchiano, Tom Stepleton, Heinrich Jiang, Silvia Chiappa
We propose an approach to fair classification that enforces independence between the classifier outputs and sensitive information by minimizing Wasserstein-1 distances.
no code implementations • 31 May 2019 • Heinrich Jiang, Maya Gupta
We present a new active sampling method we call min-margin which trains multiple learners on bootstrap samples and then chooses the examples to label based on the candidates' minimum margin amongst the bootstrapped models.
no code implementations • ICLR 2019 • Heinrich Jiang, Yifan Wu, Ofir Nachum
In non-convex settings, the resulting problem may be difficult to solve as the Lagrangian is not guaranteed to have a deterministic saddle-point equilibrium.
no code implementations • 15 Jan 2019 • Heinrich Jiang, Ofir Nachum
We do so by assuming the existence of underlying, unknown, and unbiased labels which are overwritten by an agent who intends to provide accurate labels but may have biases against certain groups.
no code implementations • 31 Oct 2018 • Jennifer Jang, Heinrich Jiang
Surprisingly, up to a certain point, we can enjoy the same estimation rates while lowering computational cost, showing that DBSCAN++ is a sub-quadratic algorithm that attains minimax optimal rates for level-set estimation, a quality that may be of independent interest.
1 code implementation • 11 Sep 2018 • Andrew Cotter, Heinrich Jiang, Serena Wang, Taman Narayan, Maya Gupta, Seungil You, Karthik Sridharan
This new formulation leads to an algorithm that produces a stochastic classifier by playing a two-player non-zero-sum game solving for what we call a semi-coarse correlated equilibrium, which in turn corresponds to an approximately optimal and feasible solution to the constrained optimization problem.
1 code implementation • 29 Jun 2018 • Andrew Cotter, Maya Gupta, Heinrich Jiang, Nathan Srebro, Karthik Sridharan, Serena Wang, Blake Woodworth, Seungil You
Classifiers can be trained with data-dependent constraints to satisfy fairness goals, reduce churn, achieve a targeted false positive rate, or other policy goals.
no code implementations • 31 May 2018 • Andrew Cotter, Maya Gupta, Heinrich Jiang, James Muller, Taman Narayan, Serena Wang, Tao Zhu
We propose learning flexible but interpretable functions that aggregate a variable-length set of permutation-invariant feature vectors to predict a label.
1 code implementation • NeurIPS 2018 • Heinrich Jiang, Been Kim, Melody Y. Guan, Maya Gupta
Knowing when a classifier's prediction can be trusted is useful in many applications and critical for safely using AI.
1 code implementation • ICML 2018 • Heinrich Jiang, Jennifer Jang, Samory Kpotufe
We provide initial seedings to the Quick Shift clustering algorithm, which approximate the locally high-density regions of the data.
1 code implementation • 17 Apr 2018 • Andrew Cotter, Heinrich Jiang, Karthik Sridharan
For both the proxy-Lagrangian and Lagrangian formulations, however, we prove that this classifier, instead of having unbounded size, can be taken to be a distribution over no more than m+1 models (where m is the number of constraints).
no code implementations • 5 Jan 2018 • Melody Y. Guan, Heinrich Jiang
We analyze the $K$-armed bandit problem where the reward for each arm is a noisy realization based on an observed context under mild nonparametric assumptions.
no code implementations • NeurIPS 2017 • Heinrich Jiang
Quick Shift is a popular mode-seeking and clustering algorithm.
no code implementations • ICML 2017 • Heinrich Jiang
Kernel density estimation (KDE) is a popular nonparametric density estimation method.
no code implementations • 19 Jul 2017 • Heinrich Jiang
We derive high-probability finite-sample uniform rates of consistency for $k$-NN regression that are optimal up to logarithmic factors under mild assumptions.
no code implementations • ICML 2017 • Heinrich Jiang
When the data lies on an embedded unknown $d$-dimensional manifold in $\mathbb{R}^D$, then we obtain a rate of $\widetilde{O}(n^{-1/(2\beta + d\cdot \max\{1, \beta \})})$.
1 code implementation • 13 Jun 2016 • Heinrich Jiang, Samory Kpotufe
We present a first procedure that can estimate -- with statistical consistency guarantees -- any local-maxima of a density, under benign distributional conditions.