Search Results for author: Daniel Hsu

Found 73 papers, 12 papers with code

Quantifying the Effects of COVID-19 on Restaurant Reviews

no code implementations NAACL (SocialNLP) 2021 Ivy Cao, Zizhou Liu, Giannis Karamanolakis, Daniel Hsu, Luis Gravano

As of now, however, it is not clear how and to what extent the pandemic has affected restaurant reviews, an analysis of which could potentially inform policies for addressing this ongoing situation.

Time Series Analysis

Transformers, parallel computation, and logarithmic depth

1 code implementation14 Feb 2024 Clayton Sanford, Daniel Hsu, Matus Telgarsky

We show that a constant number of self-attention layers can efficiently simulate, and be simulated by, a constant number of communication rounds of Massively Parallel Computation.

Multi-group Learning for Hierarchical Groups

no code implementations1 Feb 2024 Samuel Deng, Daniel Hsu

The multi-group learning model formalizes the learning scenario in which a single predictor must generalize well on multiple, possibly overlapping subgroups of interest.

Distribution-Specific Auditing For Subgroup Fairness

no code implementations27 Jan 2024 Daniel Hsu, Jizhou Huang, Brendan Juba

In this work, we give positive and negative results on auditing for Gaussian distributions: On the positive side, we present an alternative approach to leverage these advances in agnostic learning and thereby obtain the first polynomial-time approximation scheme (PTAS) for auditing nontrivial combinatorial subgroup fairness: we show how to audit statistical notions of fairness over homogeneous halfspace subgroups when the features are Gaussian.


On the sample complexity of parameter estimation in logistic regression with normal design

no code implementations9 Jul 2023 Daniel Hsu, Arya Mazumdar

The logistic regression model is one of the most popular data generation model in noisy binary classification problems.

Binary Classification Generalization Bounds +1

Group conditional validity via multi-group learning

no code implementations7 Mar 2023 Samuel Deng, Navid Ardeshir, Daniel Hsu

We consider the problem of distribution-free conformal prediction and the criterion of group conditional validity.

Conformal Prediction Fairness

Intrinsic dimensionality and generalization properties of the $\mathcal{R}$-norm inductive bias

1 code implementation10 Jun 2022 Navid Ardeshir, Daniel Hsu, Clayton Sanford

We study the structural and statistical properties of $\mathcal{R}$-norm minimizing interpolants of datasets labeled by specific target functions.

Inductive Bias

Statistical-Computational Trade-offs in Tensor PCA and Related Problems via Communication Complexity

no code implementations15 Apr 2022 Rishabh Dudeja, Daniel Hsu

Similar lower bounds are obtained for Non-Gaussian Component Analysis, a family of statistical estimation problems in which low-order moment tensors carry no information about the unknown parameter.

Masked prediction tasks: a parameter identifiability view

no code implementations18 Feb 2022 Bingbin Liu, Daniel Hsu, Pradeep Ravikumar, Andrej Risteski

This lens is undoubtedly very interesting, but suffers from the problem that there isn't a "canonical" set of downstream tasks to focus on -- in practice, this problem is usually resolved by competing on the benchmark dataset du jour.

Self-Supervised Learning

Near-Optimal Statistical Query Lower Bounds for Agnostically Learning Intersections of Halfspaces with Gaussian Marginals

no code implementations10 Feb 2022 Daniel Hsu, Clayton Sanford, Rocco Servedio, Emmanouil-Vasileios Vlatakis-Gkaragkounis

This lower bound is essentially best possible since an SQ algorithm of Klivans et al. (2008) agnostically learns this class to any constant excess error using $n^{O(\log k)}$ queries of tolerance $n^{-O(\log k)}$.

Learning Tensor Representations for Meta-Learning

no code implementations18 Jan 2022 Samuel Deng, Yilin Guo, Daniel Hsu, Debmalya Mandal

Prior works on learning linear representations for meta-learning assume that there is a common shared representation across different tasks, and do not consider the additional task-specific observable side information.


Simple and near-optimal algorithms for hidden stratification and multi-group learning

no code implementations22 Dec 2021 Christopher Tosh, Daniel Hsu

Multi-group agnostic learning is a formal learning criterion that is concerned with the conditional risks of predictors within subgroups of a population.


Bayesian decision-making under misspecified priors with applications to meta-learning

no code implementations NeurIPS 2021 Max Simchowitz, Christopher Tosh, Akshay Krishnamurthy, Daniel Hsu, Thodoris Lykouris, Miroslav Dudík, Robert E. Schapire

We prove that the expected reward accrued by Thompson sampling (TS) with a misspecified prior differs by at most $\tilde{\mathcal{O}}(H^2 \epsilon)$ from TS with a well specified prior, where $\epsilon$ is the total-variation distance between priors and $H$ is the learning horizon.

Decision Making Meta-Learning +2

Support vector machines and linear regression coincide with very high-dimensional features

1 code implementation NeurIPS 2021 Navid Ardeshir, Clayton Sanford, Daniel Hsu

The support vector machine (SVM) and minimum Euclidean norm least squares regression are two fundamentally different approaches to fitting linear models, but they have recently been connected in models for very high-dimensional data through a phenomenon of support vector proliferation, where every training example used to fit an SVM becomes a support vector.


Generalization bounds via distillation

no code implementations ICLR 2021 Daniel Hsu, Ziwei Ji, Matus Telgarsky, Lan Wang

This paper theoretically investigates the following empirical phenomenon: given a high-complexity network with poor generalization bounds, one can distill it into a network with nearly identical predictions but low complexity and vastly smaller generalization bounds.

Data Augmentation Generalization Bounds

On the Approximation Power of Two-Layer Networks of Random ReLUs

no code implementations3 Feb 2021 Daniel Hsu, Clayton Sanford, Rocco A. Servedio, Emmanouil-Vasileios Vlatakis-Gkaragkounis

This paper considers the following question: how well can depth-two ReLU networks with randomly initialized bottom-level weights represent smooth functions?

Vocal Bursts Valence Prediction

Markov Chain Monte Carlo Policy Optimization

no code implementations4 Jan 2021 Daniel Hsu

Discovering approximately optimal policies in domains is crucial to applying reinforcement learning (RL) in many real-world scenarios, which is termed as policy optimization.

Continuous Control reinforcement-learning +2

Detecting Foodborne Illness Complaints in Multiple Languages Using English Annotations Only

no code implementations EMNLP (Louhi) 2020 Ziyi Liu, Giannis Karamanolakis, Daniel Hsu, Luis Gravano

To improve performance without extra annotations, we create artificial training documents in the target language through machine translation and train mBERT jointly for the source (English) and target language.

Machine Translation text-classification +1

Cross-Lingual Text Classification with Minimal Resources by Transferring a Sparse Teacher

1 code implementation Findings of the Association for Computational Linguistics 2020 Giannis Karamanolakis, Daniel Hsu, Luis Gravano

In this work, we propose a cross-lingual teacher-student method, CLTS, that generates "weak" supervision in the target language using minimal cross-lingual resources, in the form of a small number of word translations.

General Classification Representation Learning +2

On the proliferation of support vectors in high dimensions

no code implementations22 Sep 2020 Daniel Hsu, Vidya Muthukumar, Ji Xu

The support vector machine (SVM) is a well-established classification method whose name refers to the particular training examples, called support vectors, that determine the maximum margin separating hyperplane.

General Classification Vocal Bursts Intensity Prediction

Contrastive learning, multi-view redundancy, and linear models

no code implementations24 Aug 2020 Christopher Tosh, Akshay Krishnamurthy, Daniel Hsu

Self-supervised learning is an empirically successful approach to unsupervised learning based on creating artificial supervised learning problems.

Contrastive Learning Representation Learning +1

Statistical Query Lower Bounds for Tensor PCA

no code implementations10 Aug 2020 Rishabh Dudeja, Daniel Hsu

Our analysis reveals that the optimal sample complexity in the SQ model depends on whether $\mathbb{E} \mathbf{T}_1$ is symmetric or not.

Interpreting deep learning models for weak lensing

no code implementations13 Jul 2020 José Manuel Zorrilla Matilla, Manasi Sharma, Daniel Hsu, Zoltán Haiman

Deep Neural Networks (DNNs) are powerful algorithms that have been proven capable of extracting non-Gaussian information from weak lensing (WL) data sets.

Cosmology and Nongalactic Astrophysics

Ensuring Fairness Beyond the Training Data

2 code implementations NeurIPS 2020 Debmalya Mandal, Samuel Deng, Suman Jana, Jeannette M. Wing, Daniel Hsu

In this work, we develop classifiers that are fair not only with respect to the training distribution, but also for a class of distributions that are weighted perturbations of the training samples.


Contrastive estimation reveals topic posterior information to linear models

no code implementations4 Mar 2020 Christopher Tosh, Akshay Krishnamurthy, Daniel Hsu

Contrastive learning is an approach to representation learning that utilizes naturally occurring similar and dissimilar pairs of data points to find useful embeddings of data.

Classification Contrastive Learning +3

A New Framework for Query Efficient Active Imitation Learning

no code implementations30 Dec 2019 Daniel Hsu

The results show that the proposed method significantly outperforms uncertainty-based methods on learning reward models, achieving better query efficiency, where the adversarial discriminator can make the agent learn human behavior more efficiently and the SR can select states which have stronger impact on value function.

Imitation Learning Reinforcement Learning (RL)

Weakly Supervised Attention Networks for Fine-Grained Opinion Mining and Public Health

no code implementations WS 2019 Giannis Karamanolakis, Daniel Hsu, Luis Gravano

In many review classification applications, a fine-grained analysis of the reviews is desirable, because different segments (e. g., sentences) of a review may focus on different aspects of the entity in question.

Classification General Classification +4

Privacy Accounting and Quality Control in the Sage Differentially Private ML Platform

no code implementations4 Sep 2019 Mathias Lecuyer, Riley Spahn, Kiran Vodrahalli, Roxana Geambasu, Daniel Hsu

Companies increasingly expose machine learning (ML) models trained over sensitive user data to untrusted domains, such as end-user devices and wide-access model stores.

Leveraging Just a Few Keywords for Fine-Grained Aspect Detection Through Weakly Supervised Co-Training

1 code implementation IJCNLP 2019 Giannis Karamanolakis, Daniel Hsu, Luis Gravano

In this work, we consider weakly supervised approaches for training aspect classifiers that only require the user to provide a small set of seed words (i. e., weakly positive indicators) for the aspects of interest.

Aspect Category Detection Opinion Mining +2

Unbiased estimators for random design regression

no code implementations8 Jul 2019 Michał Dereziński, Manfred K. Warmuth, Daniel Hsu

We use them to show that for any input distribution and $\epsilon>0$ there is a random design consisting of $O(d\log d+ d/\epsilon)$ points from which an unbiased estimator can be constructed whose expected square loss over the entire distribution is bounded by $1+\epsilon$ times the loss of the optimum.


A gradual, semi-discrete approach to generative network training via explicit Wasserstein minimization

no code implementations8 Jun 2019 Yu-cheng Chen, Matus Telgarsky, Chao Zhang, Bolton Bailey, Daniel Hsu, Jian Peng

This paper provides a simple procedure to fit generative networks to target distributions, with the goal of a small Wasserstein distance (or other optimal transport costs).

A cryptographic approach to black box adversarial machine learning

no code implementations7 Jun 2019 Kevin Shi, Daniel Hsu, Allison Bishop

We propose a new randomized ensemble technique with a provable security guarantee against black-box transfer attacks.

BIG-bench Machine Learning

Diameter-based Interactive Structure Discovery

no code implementations5 Jun 2019 Christopher Tosh, Daniel Hsu

We introduce interactive structure discovery, a generic framework that encompasses many interactive learning settings, including active learning, top-k item identification, interactive drug discovery, and others.

Active Learning Drug Discovery

On the number of variables to use in principal component regression

no code implementations NeurIPS 2019 Ji Xu, Daniel Hsu

We study least squares linear regression over $N$ uncorrelated Gaussian features that are selected in order of decreasing variance.


Two models of double descent for weak features

no code implementations18 Mar 2019 Mikhail Belkin, Daniel Hsu, Ji Xu

The "double descent" risk curve was proposed to qualitatively describe the out-of-sample prediction accuracy of variably-parameterized machine learning models.

BIG-bench Machine Learning Vocal Bursts Valence Prediction

Training Neural Networks for Aspect Extraction Using Descriptive Keywords Only

no code implementations ICLR Workshop LLD 2019 Giannis Karamanolakis, Daniel Hsu, Luis Gravano

In this work, we propose a weakly supervised approach for training neural networks for aspect extraction in cases where only a small set of seed words, i. e., keywords that describe an aspect, are available.

Aspect Extraction Descriptive +3

Weak lensing cosmology with convolutional neural networks on noisy data

1 code implementation10 Feb 2019 Dezső Ribli, Bálint Ármin Pataki, José Manuel Zorrilla Matilla, Daniel Hsu, Zoltán Haiman, István Csabai

Previous studies attempted to extract non-Gaussian information from weak lensing observations through several higher-order statistics such as the three-point correlation function, peak counts or Minkowski-functionals.

Cosmology and Nongalactic Astrophysics

Consistent Risk Estimation in Moderately High-Dimensional Linear Regression

no code implementations5 Feb 2019 Ji Xu, Arian Maleki, Kamiar Rahnama Rad, Daniel Hsu

This paper studies the problem of risk estimation under the moderately high-dimensional asymptotic setting $n, p \rightarrow \infty$ and $n/p \rightarrow \delta>1$ ($\delta$ is a fixed number), and proves the consistency of three risk estimates that have been successful in numerical studies, i. e., leave-one-out cross validation (LOOCV), approximate leave-one-out (ALO), and approximate message passing (AMP)-based techniques.

regression Vocal Bursts Intensity Prediction

Reconciling modern machine learning practice and the bias-variance trade-off

3 code implementations28 Dec 2018 Mikhail Belkin, Daniel Hsu, Siyuan Ma, Soumik Mandal

This connection between the performance and the structure of machine learning models delineates the limits of classical analyses, and has implications for both the theory and practice of machine learning.

BIG-bench Machine Learning Test

Benefits of over-parameterization with EM

no code implementations NeurIPS 2018 Ji Xu, Daniel Hsu, Arian Maleki

Expectation Maximization (EM) is among the most popular algorithms for maximum likelihood estimation, but it is generally only guaranteed to find its stationary points of the log-likelihood objective.

Overfitting or perfect fitting? Risk bounds for classification and regression rules that interpolate

no code implementations NeurIPS 2018 Mikhail Belkin, Daniel Hsu, Partha Mitra

Finally, this paper suggests a way to explain the phenomenon of adversarial examples, which are seemingly ubiquitous in modern machine learning, and also discusses some connections to kernel machines and random forests in the interpolated regime.

BIG-bench Machine Learning General Classification +2

Leveraged volume sampling for linear regression

no code implementations NeurIPS 2018 Michał Dereziński, Manfred K. Warmuth, Daniel Hsu

We then develop a new rescaled variant of volume sampling that produces an unbiased estimate which avoids this bad behavior and has at least as good a tail bound as leverage score sampling: sample size $k=O(d\log d + d/\epsilon)$ suffices to guarantee total loss at most $1+\epsilon$ times the minimum with high probability.

Point Processes regression

Certified Robustness to Adversarial Examples with Differential Privacy

6 code implementations9 Feb 2018 Mathias Lecuyer, Vaggelis Atlidakis, Roxana Geambasu, Daniel Hsu, Suman Jana

Adversarial examples that fool machine learning models, particularly deep neural networks, have been a topic of intense research interest, with attacks and defenses being developed in a tight back-and-forth.

Mixing time estimation in reversible Markov chains from a single sample path

no code implementations NeurIPS 2015 Daniel Hsu, Aryeh Kontorovich, David A. Levin, Yuval Peres, Csaba Szepesvári

The interval is constructed around the relaxation time $t_{\text{relax}} = 1/\gamma$, which is strongly related to the mixing time, and the width of the interval converges to zero roughly at a $1/\sqrt{n}$ rate, where $n$ is the length of the sample path.

Anomaly Detection on Graph Time Series

no code implementations9 Aug 2017 Daniel Hsu

In this paper, we use variational recurrent neural network to investigate the anomaly detection problem on graph time series.

Anomaly Detection Time Series +2

Time Series Compression Based on Adaptive Piecewise Recurrent Autoencoder

no code implementations23 Jul 2017 Daniel Hsu

Time series account for a large proportion of the data stored in financial, medical and scientific databases.

Time Series Time Series Analysis

Multi-period Time Series Modeling with Sparsity via Bayesian Variational Inference

no code implementations3 Jul 2017 Daniel Hsu

Previous methods based on stacked recurrent neural network (RNN) and deep belief network (DBN) models cannot model the tendencies in multiple periods, and no models for sequential data pay special attention to redundant input variables which have no or even negative impact on prediction and modeling.

Time Series Time Series Analysis +1

Parameter identification in Markov chain choice models

no code implementations2 Jun 2017 Arushi Gupta, Daniel Hsu

The underlying parameters of the model were previously shown to be identifiable from the choice probabilities for the all-products assortment, together with choice probabilities for assortments of all-but-one products.

Linear regression without correspondence

no code implementations NeurIPS 2017 Daniel Hsu, Kevin Shi, Xiaorui Sun

Next, in an average-case and noise-free setting where the responses exactly correspond to a linear function of i. i. d.


Kernel Approximation Methods for Speech Recognition

no code implementations13 Jan 2017 Avner May, Alireza Bagheri Garakani, Zhiyun Lu, Dong Guo, Kuan Liu, Aurélien Bellet, Linxi Fan, Michael Collins, Daniel Hsu, Brian Kingsbury, Michael Picheny, Fei Sha

First, in order to reduce the number of random features required by kernel models, we propose a simple but effective method for feature selection.

feature selection speech-recognition +1

Global analysis of Expectation Maximization for mixtures of two Gaussians

no code implementations NeurIPS 2016 Ji Xu, Daniel Hsu, Arian Maleki

Expectation Maximization (EM) is among the most popular algorithms for estimating parameters of statistical models.

Vocal Bursts Valence Prediction

Greedy bi-criteria approximations for $k$-medians and $k$-means

no code implementations21 Jul 2016 Daniel Hsu, Matus Telgarsky

This paper investigates the following natural greedy procedure for clustering in the bi-criterion setting: iteratively grow a set of centers, in each round adding the center from a candidate set that maximally decreases clustering cost.


Search Improves Label for Active Learning

no code implementations NeurIPS 2016 Alina Beygelzimer, Daniel Hsu, John Langford, Chicheng Zhang

We investigate active learning with access to two distinct oracles: Label (which is standard) and Search (which is not).

Active Learning

Unsupervised Part-Of-Speech Tagging with Anchor Hidden Markov Models

1 code implementation TACL 2016 Karl Stratos, Michael Collins, Daniel Hsu

We tackle unsupervised part-of-speech (POS) tagging by learning hidden Markov models (HMMs) that are particularly well-suited for the problem.

Clustering POS +3

Mixing Time Estimation in Reversible Markov Chains from a Single Sample Path

no code implementations NeurIPS 2015 Daniel Hsu, Aryeh Kontorovich, Csaba Szepesvári

The interval is constructed around the relaxation time $t_{\text{relax}}$, which is strongly related to the mixing time, and the width of the interval converges to zero roughly at a $\sqrt{n}$ rate, where $n$ is the length of the sample path.

The Large Margin Mechanism for Differentially Private Maximization

no code implementations NeurIPS 2014 Kamalika Chaudhuri, Daniel Hsu, Shuang Song

A basic problem in the design of privacy-preserving algorithms is the private maximization problem: the goal is to pick an item from a universe that (approximately) maximizes a data-dependent function, all under the constraint of differential privacy.

BIG-bench Machine Learning Privacy Preserving

Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits

1 code implementation4 Feb 2014 Alekh Agarwal, Daniel Hsu, Satyen Kale, John Langford, Lihong Li, Robert E. Schapire

We present a new algorithm for the contextual bandit learning problem, where the learner repeatedly takes one of $K$ actions in response to the observed context, and observes the reward only for that chosen action.

General Classification Multi-Armed Bandits

When are Overcomplete Topic Models Identifiable? Uniqueness of Tensor Tucker Decompositions with Structured Sparsity

no code implementations NeurIPS 2013 Animashree Anandkumar, Daniel Hsu, Majid Janzamin, Sham Kakade

This set of higher-order expansion conditions allow for overcomplete models, and require the existence of a perfect matching from latent topics to higher order observed words.

Topic Models

Loss minimization and parameter estimation with heavy tails

no code implementations7 Jul 2013 Daniel Hsu, Sivan Sabato

This work studies applications and generalizations of a simple estimation technique that provides exponential concentration under heavy-tailed distributions, assuming only bounded low-order moments.


A Tensor Approach to Learning Mixed Membership Community Models

no code implementations12 Feb 2013 Anima Anandkumar, Rong Ge, Daniel Hsu, Sham M. Kakade

We provide guaranteed recovery of community memberships and model parameters and present a careful finite sample analysis of our learning method.

Community Detection Stochastic Block Model

Learning Sparse Low-Threshold Linear Classifiers

no code implementations13 Dec 2012 Sivan Sabato, Shai Shalev-Shwartz, Nathan Srebro, Daniel Hsu, Tong Zhang

We consider the problem of learning a non-negative linear classifier with a $1$-norm of at most $k$, and a fixed threshold, under the hinge-loss.

Tensor decompositions for learning latent variable models

no code implementations29 Oct 2012 Anima Anandkumar, Rong Ge, Daniel Hsu, Sham M. Kakade, Matus Telgarsky

This work considers a computationally and statistically efficient parameter estimation method for a wide class of latent variable models---including Gaussian mixture models, hidden Markov models, and latent Dirichlet allocation---which exploits a certain tensor structure in their low-order observable moments (typically, of second- and third-order).

Learning Topic Models and Latent Bayesian Networks Under Expansion Constraints

no code implementations24 Sep 2012 Animashree Anandkumar, Daniel Hsu, Adel Javanmard, Sham M. Kakade

The sufficient conditions for identifiability of these models are primarily based on weak expansion constraints on the topic-word matrix, for topic models, and on the directed acyclic graph, for Bayesian networks.

Topic Models

A Method of Moments for Mixture Models and Hidden Markov Models

1 code implementation3 Mar 2012 Animashree Anandkumar, Daniel Hsu, Sham M. Kakade

Mixture models are a fundamental tool in applied statistics and machine learning for treating data taken from multiple subpopulations.

Random design analysis of ridge regression

no code implementations13 Jun 2011 Daniel Hsu, Sham M. Kakade, Tong Zhang

The analysis also reveals the effect of errors in the estimated covariance structure, as well as the effect of modeling errors, neither of which effects are present in the fixed design setting.

LEMMA regression

A Spectral Algorithm for Learning Hidden Markov Models

no code implementations26 Nov 2008 Daniel Hsu, Sham M. Kakade, Tong Zhang

Hidden Markov Models (HMMs) are one of the most fundamental and widely used statistical tools for modeling discrete time series.

Time Series Time Series Analysis

Cannot find the paper you are looking for? You can Submit a new open access paper.