Search Results for author: Heinrich Jiang

Found 34 papers, 7 papers with code

SpacTor-T5: Pre-training T5 Models with Span Corruption and Replaced Token Detection

no code implementations24 Jan 2024 Ke Ye, Heinrich Jiang, Afshin Rostamizadeh, Ayan Chakrabarti, Giulia Desalvo, Jean-François Kagy, Lazaros Karydas, Gui Citovsky, Sanjiv Kumar

In this paper, we present SpacTor, a new training procedure consisting of (1) a hybrid objective combining span corruption (SC) and token replacement detection (RTD), and (2) a two-stage curriculum that optimizes the hybrid objective over the initial $\tau$ iterations, then transitions to standard SC loss.

Is margin all you need? An extensive empirical study of active learning on tabular data

no code implementations7 Oct 2022 Dara Bahri, Heinrich Jiang, Tal Schuster, Afshin Rostamizadeh

Given a labeled training set and a collection of unlabeled data, the goal of active learning (AL) is to identify the best unlabeled points to label.

Active Learning Benchmarking +1

Predicting on the Edge: Identifying Where a Larger Model Does Better

no code implementations15 Feb 2022 Taman Narayan, Heinrich Jiang, Sen Zhao, Sanjiv Kumar

Much effort has been devoted to making large and more accurate models, but relatively little has been put into understanding which examples are benefiting from the added complexity.

SCARF: Self-Supervised Contrastive Learning using Random Feature Corruption

no code implementations ICLR 2022 Dara Bahri, Heinrich Jiang, Yi Tay, Donald Metzler

Self-supervised contrastive representation learning has proved incredibly successful in the vision and natural language domains, enabling state-of-the-art performance with orders of magnitude less labeled data.

Contrastive Learning Representation Learning +1

Active Covering

no code implementations4 Jun 2021 Heinrich Jiang, Afshin Rostamizadeh

We show under standard non-parametric assumptions that a classical support estimator can be repurposed as an offline algorithm attaining an excess query cost of $\widetilde{\Theta}(n^{D/(D+1)})$ compared to the optimal learner, where $n$ is the number of datapoints and $D$ is the dimension.

Active Learning

Churn Reduction via Distillation

no code implementations ICLR 2022 Heinrich Jiang, Harikrishna Narasimhan, Dara Bahri, Andrew Cotter, Afshin Rostamizadeh

In real-world systems, models are frequently updated as more data becomes available, and in addition to achieving high accuracy, the goal is to also maintain a low difference in predictions compared to the base model (i. e. predictive "churn").

MeanShift++: Extremely Fast Mode-Seeking With Applications to Segmentation and Object Tracking

no code implementations CVPR 2021 Jennifer Jang, Heinrich Jiang

The runtime is linear in the number of points and exponential in dimension, which makes MeanShift++ ideal on low-dimensional applications such as image segmentation and object tracking.

Clustering Density Estimation +3

Locally Adaptive Label Smoothing for Predictive Churn

no code implementations9 Feb 2021 Dara Bahri, Heinrich Jiang

Training modern neural networks is an inherently noisy process that can lead to high \emph{prediction churn} -- disagreements between re-trainings of the same model due to factors such as randomization in the parameter initialization and mini-batches -- even when the trained models all attain similar accuracies.

Deep $k$-NN Label Smoothing Improves Reproducibility of Neural Network Predictions

no code implementations1 Jan 2021 Dara Bahri, Heinrich Jiang

Training modern neural networks is an inherently noisy process that can lead to high \emph{prediction churn}-- disagreements between re-trainings of the same model due to factors such as randomization in the parameter initialization and mini-batches-- even when the trained models all attain high accuracies.

Stochastic Bandits with Linear Constraints

no code implementations17 Jun 2020 Aldo Pacchiano, Mohammad Ghavamzadeh, Peter Bartlett, Heinrich Jiang

We propose an upper-confidence bound algorithm for this problem, called optimistic pessimistic linear bandit (OPLB), and prove an $\widetilde{\mathcal{O}}(\frac{d\sqrt{T}}{\tau-c_0})$ bound on its $T$-round regret, where the denominator is the difference between the constraint threshold and the cost of a known feasible action.

Multi-Armed Bandits

Learning the Truth From Only One Side of the Story

no code implementations8 Jun 2020 Heinrich Jiang, Qijia Jiang, Aldo Pacchiano

Learning under one-sided feedback (i. e., where we only observe the labels for examples we predicted positively on) is a fundamental problem in machine learning -- applications include lending and recommendation systems.

Recommendation Systems

Deep k-NN for Noisy Labels

no code implementations ICML 2020 Dara Bahri, Heinrich Jiang, Maya Gupta

Modern machine learning models are often trained on examples with noisy labels that hurt performance and are hard to identify.

BIG-bench Machine Learning

Robustness Guarantees for Mode Estimation with an Application to Bandits

no code implementations5 Mar 2020 Aldo Pacchiano, Heinrich Jiang, Michael. I. Jordan

Mode estimation is a classical problem in statistics with a wide range of applications in machine learning.

Multi-Armed Bandits

Group-based Fair Learning Leads to Counter-intuitive Predictions

no code implementations4 Oct 2019 Ofir Nachum, Heinrich Jiang

A number of machine learning (ML) methods have been proposed recently to maximize model predictive accuracy while enforcing notions of group parity or fairness across sub-populations.

Fairness

Wasserstein Fair Classification

1 code implementation28 Jul 2019 Ray Jiang, Aldo Pacchiano, Tom Stepleton, Heinrich Jiang, Silvia Chiappa

We propose an approach to fair classification that enforces independence between the classifier outputs and sensitive information by minimizing Wasserstein-1 distances.

Classification Fairness +1

Minimum-Margin Active Learning

no code implementations31 May 2019 Heinrich Jiang, Maya Gupta

We present a new active sampling method we call min-margin which trains multiple learners on bootstrap samples and then chooses the examples to label based on the candidates' minimum margin amongst the bootstrapped models.

Active Learning

Stochastic Learning of Additive Second-Order Penalties with Applications to Fairness

no code implementations ICLR 2019 Heinrich Jiang, Yifan Wu, Ofir Nachum

In non-convex settings, the resulting problem may be difficult to solve as the Lagrangian is not guaranteed to have a deterministic saddle-point equilibrium.

Fairness

Identifying and Correcting Label Bias in Machine Learning

no code implementations15 Jan 2019 Heinrich Jiang, Ofir Nachum

We do so by assuming the existence of underlying, unknown, and unbiased labels which are overwritten by an agent who intends to provide accurate labels but may have biases against certain groups.

BIG-bench Machine Learning Fairness

DBSCAN++: Towards fast and scalable density clustering

no code implementations31 Oct 2018 Jennifer Jang, Heinrich Jiang

Surprisingly, up to a certain point, we can enjoy the same estimation rates while lowering computational cost, showing that DBSCAN++ is a sub-quadratic algorithm that attains minimax optimal rates for level-set estimation, a quality that may be of independent interest.

Clustering

Optimization with Non-Differentiable Constraints with Applications to Fairness, Recall, Churn, and Other Goals

1 code implementation11 Sep 2018 Andrew Cotter, Heinrich Jiang, Serena Wang, Taman Narayan, Maya Gupta, Seungil You, Karthik Sridharan

This new formulation leads to an algorithm that produces a stochastic classifier by playing a two-player non-zero-sum game solving for what we call a semi-coarse correlated equilibrium, which in turn corresponds to an approximately optimal and feasible solution to the constrained optimization problem.

Fairness

Training Well-Generalizing Classifiers for Fairness Metrics and Other Data-Dependent Constraints

1 code implementation29 Jun 2018 Andrew Cotter, Maya Gupta, Heinrich Jiang, Nathan Srebro, Karthik Sridharan, Serena Wang, Blake Woodworth, Seungil You

Classifiers can be trained with data-dependent constraints to satisfy fairness goals, reduce churn, achieve a targeted false positive rate, or other policy goals.

Fairness

Interpretable Set Functions

no code implementations31 May 2018 Andrew Cotter, Maya Gupta, Heinrich Jiang, James Muller, Taman Narayan, Serena Wang, Tao Zhu

We propose learning flexible but interpretable functions that aggregate a variable-length set of permutation-invariant feature vectors to predict a label.

To Trust Or Not To Trust A Classifier

1 code implementation NeurIPS 2018 Heinrich Jiang, Been Kim, Melody Y. Guan, Maya Gupta

Knowing when a classifier's prediction can be trusted is useful in many applications and critical for safely using AI.

Topological Data Analysis

Quickshift++: Provably Good Initializations for Sample-Based Mean Shift

1 code implementation ICML 2018 Heinrich Jiang, Jennifer Jang, Samory Kpotufe

We provide initial seedings to the Quick Shift clustering algorithm, which approximate the locally high-density regions of the data.

Clustering Image Segmentation +1

Two-Player Games for Efficient Non-Convex Constrained Optimization

1 code implementation17 Apr 2018 Andrew Cotter, Heinrich Jiang, Karthik Sridharan

For both the proxy-Lagrangian and Lagrangian formulations, however, we prove that this classifier, instead of having unbounded size, can be taken to be a distribution over no more than m+1 models (where m is the number of constraints).

BIG-bench Machine Learning Vocal Bursts Valence Prediction

Nonparametric Stochastic Contextual Bandits

no code implementations5 Jan 2018 Melody Y. Guan, Heinrich Jiang

We analyze the $K$-armed bandit problem where the reward for each arm is a noisy realization based on an observed context under mild nonparametric assumptions.

General Classification Image Classification +1

Uniform Convergence Rates for Kernel Density Estimation

no code implementations ICML 2017 Heinrich Jiang

Kernel density estimation (KDE) is a popular nonparametric density estimation method.

Density Estimation

Non-Asymptotic Uniform Rates of Consistency for k-NN Regression

no code implementations19 Jul 2017 Heinrich Jiang

We derive high-probability finite-sample uniform rates of consistency for $k$-NN regression that are optimal up to logarithmic factors under mild assumptions.

regression

Density Level Set Estimation on Manifolds with DBSCAN

no code implementations ICML 2017 Heinrich Jiang

When the data lies on an embedded unknown $d$-dimensional manifold in $\mathbb{R}^D$, then we obtain a rate of $\widetilde{O}(n^{-1/(2\beta + d\cdot \max\{1, \beta \})})$.

Modal-set estimation with an application to clustering

1 code implementation13 Jun 2016 Heinrich Jiang, Samory Kpotufe

We present a first procedure that can estimate -- with statistical consistency guarantees -- any local-maxima of a density, under benign distributional conditions.

Clustering

Cannot find the paper you are looking for? You can Submit a new open access paper.