no code implementations • ICML 2020 • Yue Sheng, Edgar Dobriban
To scale up data analysis, distributed and parallel computing approaches are increasingly needed.
1 code implementation • 16 Aug 2024 • Yan Sun, Pratik Chaudhari, Ian J. Barnett, Edgar Dobriban
We develop methods to construct asymptotically valid confidence intervals for the ECE, accounting for this behavior as well as non-negativity.
no code implementations • 16 Jun 2024 • Behrad Moniri, Hamed Hassani, Edgar Dobriban
Large Language Models (LLMs) are rapidly evolving and impacting various fields, necessitating the development of effective methods to evaluate and compare their performance.
no code implementations • 12 Jun 2024 • Patrick Chao, Edgar Dobriban, Hamed Hassani
Recent progress in large language models enables the creation of realistic machine-generated content.
1 code implementation • 29 May 2024 • Xinmeng Huang, Shuo Li, Edgar Dobriban, Osbert Bastani, Hamed Hassani, Dongsheng Ding
The growing safety concerns surrounding large language models raise an urgent need to align them with diverse human preferences to simultaneously enhance their helpfulness and safety.
1 code implementation • 4 Apr 2024 • Xinmeng Huang, Shuo Li, Mengxin Yu, Matteo Sesia, Hamed Hassani, Insup Lee, Osbert Bastani, Edgar Dobriban
Language Models (LMs) have shown promising performance in natural language generation.
1 code implementation • 1 Apr 2024 • Leda Wang, Zhixiang Zhang, Edgar Dobriban
To our knowledge, no comparable methods are available for SSE and for SRHT in PCA.
3 code implementations • 28 Mar 2024 • Patrick Chao, Edoardo Debenedetti, Alexander Robey, Maksym Andriushchenko, Francesco Croce, Vikash Sehwag, Edgar Dobriban, Nicolas Flammarion, George J. Pappas, Florian Tramer, Hamed Hassani, Eric Wong
To address these challenges, we introduce JailbreakBench, an open-sourced benchmark with the following components: (1) an evolving repository of state-of-the-art adversarial prompts, which we refer to as jailbreak artifacts; (2) a jailbreaking dataset comprising 100 behaviors -- both original and sourced from prior work (Zou et al., 2023; Mazeika et al., 2023, 2024) -- which align with OpenAI's usage policies; (3) a standardized evaluation framework at https://github. com/JailbreakBench/jailbreakbench that includes a clearly defined threat model, system prompts, chat templates, and scoring functions; and (4) a leaderboard at https://jailbreakbench. github. io/ that tracks the performance of attacks and defenses for various LLMs.
1 code implementation • 27 Mar 2024 • Xianli Zeng, Guang Cheng, Edgar Dobriban
Mitigating the disparate impact of statistical machine learning methods is crucial for ensuring fairness.
1 code implementation • 5 Feb 2024 • Xianli Zeng, Guang Cheng, Edgar Dobriban
To address this, we develop methods for Bayes-optimal fair classification, aiming to minimize classification error subject to given group fairness constraints.
1 code implementation • 26 Dec 2023 • Edgar Dobriban, Mengxin Yu
Methods for predictive inference have been developed under a variety of assumptions, often -- for instance, in standard conformal prediction -- relying on the invariance of the distribution of the data under special groups of transformations such as permutation groups.
1 code implementation • 19 Oct 2023 • Wenwen Si, Sangdon Park, Insup Lee, Edgar Dobriban, Osbert Bastani
We propose a novel algorithm for constructing prediction sets with PAC guarantees in the label shift setting.
1 code implementation • 12 Oct 2023 • Patrick Chao, Alexander Robey, Edgar Dobriban, Hamed Hassani, George J. Pappas, Eric Wong
PAIR -- which is inspired by social engineering attacks -- uses an attacker LLM to automatically generate jailbreaks for a separate targeted LLM without human intervention.
no code implementations • 11 Oct 2023 • Behrad Moniri, Donghwan Lee, Hamed Hassani, Edgar Dobriban
It is rigorously known that in two-layer fully-connected neural networks under certain conditions, one step of gradient descent on the first layer can lead to feature learning; characterized by the appearance of a separated rank-one component -- spike -- in the spectrum of the feature matrix.
2 code implementations • 3 Aug 2023 • Patrick Chao, Edgar Dobriban
Under a squared loss for mean estimation and prediction error in linear regression, we find the exact minimax risk, a least favorable perturbation, and show that the sample mean and least squares estimators are respectively optimal.
no code implementations • 28 Jun 2023 • Hongxiang Qiu, Eric Tchetgen Tchetgen, Edgar Dobriban
Despite extensive literature on dataset shift, limited works address how to efficiently use the auxiliary populations to improve the accuracy of risk evaluation for a given machine learning task in the target population.
no code implementations • 9 Jun 2023 • Xinmeng Huang, Kan Xu, Donghwan Lee, Hamed Hassani, Hamsa Bastani, Edgar Dobriban
Here, we study multitask linear regression and contextual bandits under sparse heterogeneity, where the source/task-associated parameters are equal to a global parameter plus a sparse task-specific term.
no code implementations • 18 Apr 2023 • Tengyao Wang, Edgar Dobriban, Milana Gataric, Richard J. Samworth
We propose a new method for high-dimensional semi-supervised learning problems based on the careful aggregation of the results of a low-dimensional procedure applied to many axis-aligned random projections of the data.
1 code implementation • 31 Jan 2023 • Donghwan Lee, Behrad Moniri, Xinmeng Huang, Edgar Dobriban, Hamed Hassani
Evaluating the performance of machine learning models under distribution shift is challenging, especially when we only have unlabeled data from the shifted (target) domain, along with labeled data from the original (source) domain.
1 code implementation • 9 Nov 2022 • Matteo Sesia, Stefano Favaro, Edgar Dobriban
This paper develops conformal inference methods to construct a confidence interval for the frequency of a queried object in a very large discrete data set, based on a sketch with a lower memory footprint.
no code implementations • 6 Jul 2022 • Sangdon Park, Edgar Dobriban, Insup Lee, Osbert Bastani
Uncertainty quantification is a key component of machine learning models targeted at safety-critical systems such as in healthcare or autonomous vehicles.
1 code implementation • 18 Jun 2022 • Druv Pai, Michael Psenka, Chih-Yuan Chiu, Manxi Wu, Edgar Dobriban, Yi Ma
We consider the problem of learning discriminative representations for data in a high-dimensional space with distribution supported on or around multiple low-dimensional linear subspaces.
no code implementations • 16 Jun 2022 • Yinshuang Xu, Jiahui Lei, Edgar Dobriban, Kostas Daniilidis
We present a unified derivation of kernels via the Fourier domain by leveraging the sparsity of Fourier coefficients of the lifted feature fields.
no code implementations • 10 Jun 2022 • Souradeep Dutta, Yahan Yang, Elena Bernardis, Edgar Dobriban, Insup Lee
We propose a new method for classification which can improve robustness to distribution shifts, by combining expert knowledge about the ``high-level" structure of the data with standard classifiers.
no code implementations • 1 Jun 2022 • Xinmeng Huang, Donghwan Lee, Edgar Dobriban, Hamed Hassani
In modern machine learning, users often have to collaborate to learn the distribution of the data.
no code implementations • 22 May 2022 • Shuo Li, Xiayan Ji, Edgar Dobriban, Oleg Sokolsky, Insup Lee
Anomaly detection is essential for preventing hazardous outcomes for safety-critical applications like autonomous driving.
1 code implementation • 15 May 2022 • Xianli Zeng, Edgar Dobriban, Guang Cheng
This paper considers predictive parity, which requires equalizing the probability of success given a positive prediction among different protected groups.
no code implementations • 5 Apr 2022 • Evangelos Chatzipantazis, Stefanos Pertigkiozoglou, Edgar Dobriban, Kostas Daniilidis
In contrast to previous shape reconstruction methods that align the input to a regular grid, we operate directly on the irregular point cloud.
1 code implementation • 11 Mar 2022 • Hongxiang Qiu, Edgar Dobriban, Eric Tchetgen Tchetgen
Predicting sets of outcomes -- instead of unique outcomes -- is a promising solution to uncertainty quantification in statistical learning.
1 code implementation • 3 Mar 2022 • Donghwan Lee, Xinmeng Huang, Hamed Hassani, Edgar Dobriban
We find that detecting mis-calibration is only possible when the conditional probabilities of the classes are sufficiently smooth functions of the predictions.
1 code implementation • 25 Feb 2022 • Souradeep Dutta, Kaustubh Sridhar, Osbert Bastani, Edgar Dobriban, James Weimer, Insup Lee, Julia Parish-Morris
We formulate expert intervention as allowing the agent to execute option templates before learning an implementation.
1 code implementation • 20 Feb 2022 • Xianli Zeng, Edgar Dobriban, Guang Cheng
Machine learning algorithms are becoming integrated into more and more high-stakes decision-making processes, such as in social welfare issues.
no code implementations • 7 Jan 2022 • Ramneet Kaur, Susmit Jha, Anirban Roy, Sangdon Park, Edgar Dobriban, Oleg Sokolsky, Insup Lee
We propose the new method iDECODe, leveraging in-distribution equivariance for conformal OOD detection.
no code implementations • 16 Nov 2021 • Evangelos Chatzipantazis, Stefanos Pertigkiozoglou, Kostas Daniilidis, Edgar Dobriban
We propose a new \emph{Transformed Risk Minimization} (TRM) framework as an extension of classical risk minimization.
no code implementations • 4 Oct 2021 • Lingjiao Chen, Leshang Chen, Hongyi Wang, Susan Davidson, Edgar Dobriban
There has been a growing need to provide Byzantine-resilience in distributed model training.
1 code implementation • 26 Aug 2021 • Dominic Richards, Edgar Dobriban, Patrick Rebeschini
Methods for learning from data depend on various types of tuning parameters, such as penalization strength or step size.
1 code implementation • ICLR 2022 • Sangdon Park, Edgar Dobriban, Insup Lee, Osbert Bastani
Our approach focuses on the setting where there is a covariate shift from the source distribution (where we have labeled training examples) to the target distribution (for which we want to quantify uncertainty).
1 code implementation • 17 Mar 2021 • Yaodong Yu, Zitong Yang, Edgar Dobriban, Jacob Steinhardt, Yi Ma
To investigate this gap, we decompose the test risk into its bias and variance components and study their behavior as a function of adversarial training perturbation radii ($\varepsilon$).
no code implementations • NeurIPS 2020 • Jonathan Lacotte, Sifan Liu, Edgar Dobriban, Mert Pilanci
These show that the convergence rate for Haar and randomized Hadamard matrices are identical, and asymptotically improve upon Gaussian random projections.
no code implementations • 21 Nov 2020 • Michał Dereziński, Zhenyu Liao, Edgar Dobriban, Michael W. Mahoney
For a tall $n\times d$ matrix $A$ and a random $m\times n$ sketching matrix $S$, the sketched estimate of the inverse covariance matrix $(A^\top A)^{-1}$ is typically biased: $E[(\tilde A^\top\tilde A)^{-1}]\ne(A^\top A)^{-1}$, where $\tilde A=SA$.
1 code implementation • 11 Oct 2020 • Licong Lin, Edgar Dobriban
This leads to discovering the unimodality of variance as a function of the level of parametrization, and to decomposing the variance into that arising from label noise, initialization, and randomness in the training data to understand the sources of the error.
1 code implementation • ICML 2020 • Yinjun Wu, Edgar Dobriban, Susan B. Davidson
Machine learning models are not static and may need to be retrained on slightly changed datasets, for instance, with the addition or deletion of a set of data points.
no code implementations • 9 Jun 2020 • Edgar Dobriban, Hamed Hassani, David Hong, Alexander Robey
It is well known that machine learning methods can be vulnerable to adversarially-chosen perturbations of their inputs.
no code implementations • ICML 2020 • Alnur Ali, Edgar Dobriban, Ryan J. Tibshirani
We study the implicit regularization of mini-batch stochastic gradient descent, when applied to the fundamental problem of least squares regression.
no code implementations • 3 Feb 2020 • Jonathan Lacotte, Sifan Liu, Edgar Dobriban, Mert Pilanci
These show that the convergence rate for Haar and randomized Hadamard matrices are identical, and asymptotically improve upon Gaussian random projections.
no code implementations • NeurIPS 2020 • Xiaoxia Wu, Edgar Dobriban, Tongzheng Ren, Shanshan Wu, Zhiyuan Li, Suriya Gunasekar, Rachel Ward, Qiang Liu
For certain stepsizes of g and w , we show that they can converge close to the minimum norm solution.
2 code implementations • ICLR 2020 • Sifan Liu, Edgar Dobriban
(2) how to correctly use cross-validation to choose the regularization parameter?
1 code implementation • NeurIPS 2020 • Shuxiao Chen, Edgar Dobriban, Jane H Lee
Data augmentation is a widely used trick when training deep neural networks: in addition to the original data, properly transformed data are also added to the training set.
1 code implementation • 22 Mar 2019 • Edgar Dobriban, Yue Sheng
Here we study a fundamental and highly important problem in this area: How to do ridge regression in a distributed computing environment?
1 code implementation • NeurIPS 2019 • Edgar Dobriban, Sifan Liu
We consider a least squares regression problem where the data has been generated from a linear model, and we are interested to learn the unknown regression parameters.
1 code implementation • 30 Sep 2018 • Edgar Dobriban, Yue Sheng
Here we study the performance loss in estimation, test error, and confidence interval length in high dimensions, where the number of parameters is comparable to the training data size.
1 code implementation • 1 Jul 2018 • Edgar Dobriban, Weijie J. Su
In this paper, we propose methods that are robust to large and unequal noise in different observational units (i. e., heteroskedasticity) for statistical inference in linear regression.
Statistics Theory Methodology Statistics Theory
1 code implementation • 26 Jun 2018 • Edgar Dobriban
Modern high-throughput science often leads to multiple testing problems: researchers test many hypotheses, wishing to find the significant discoveries.
Methodology
1 code implementation • 11 Nov 2017 • Edgar Dobriban, Art B. Owen
This paper presents a deterministic version of PA (DPA), which is faster and more reproducible than PA. We show that DPA selects large factors and does not select small factors just like [Dobriban, 2017] shows for PA.
Methodology
1 code implementation • 2 Oct 2017 • Edgar Dobriban
In this paper, we show that the parallel analysis permutation method consistently selects the large components in certain high-dimensional factor models.
Statistics Theory Methodology Statistics Theory
1 code implementation • 17 Nov 2016 • Lydia T. Liu, Edgar Dobriban, Amit Singer
We develop $e$PCA (exponential family PCA), a new methodology for PCA on exponential family distributions.
Methodology
1 code implementation • 10 Jul 2015 • Edgar Dobriban, Stefan Wager
We provide a unified analysis of the predictive risk of ridge regression and regularized discriminant analysis in a dense random effects model.
1 code implementation • 7 Jul 2015 • Edgar Dobriban
Asymptotically, as $n, p \to \infty$ with $p/n \to \gamma$, there is a deterministic mapping from the population spectral distribution (PSD) to the empirical spectral distribution (ESD) of the eigenvalues.
Numerical Analysis Probability
4 code implementations • 12 Apr 2015 • Edgar Dobriban, Kristen Fortney, Stuart K. Kim, Art B. Owen
For a Gaussian prior on effect sizes, we show that finding the optimal weights is a non-convex problem.
Methodology