1 code implementation • 20 May 2025 • Lucas Rosenblatt, Bin Han, Robert Wolfe, Bill Howe
Large language models (LLMs) can leak sensitive training data through memorization and membership inference attacks.
1 code implementation • 19 Apr 2025 • Shlomi Hod, Lucas Rosenblatt, Julia Stoyanovich
Differentially private (DP) machine learning often relies on the availability of public data for tasks like privacy-utility trade-off estimation, hyperparameter tuning, and pretraining.
no code implementations • 8 Nov 2024 • Lucas Rosenblatt, Yuliia Lut, Eitan Turok, Marco Avella-Medina, Rachel Cummings
We consider DP variants of pre-processing methods that privately augment the original dataset to reduce the class imbalance; these include oversampling, SMOTE, and private synthetic data generation.
no code implementations • 2 Oct 2024 • Lucas Rosenblatt, R. Teal Witter
Our benchmark posits that fair predictive uncertainty estimates should be consistent across learning pipelines and calibrated to observed randomness.
no code implementations • 22 Aug 2024 • Cameron Musco, Christopher Musco, Lucas Rosenblatt, Apoorv Vikram Singh
We illustrate a second application of our new moment-based recovery bound in numerical linear algebra: by improving an approach of Braverman, Krishnan, and Musco [STOC 2022], our result yields a faster algorithm for estimating the spectral density of a symmetric matrix up to small error in the Wasserstein distance.
no code implementations • 27 May 2024 • Robert Wolfe, Isaac Slaughter, Bin Han, Bingbing Wen, Yiwei Yang, Lucas Rosenblatt, Bernease Herman, Eva Brown, Zening Qu, Nic Weber, Bill Howe
The rapid proliferation of generative AI has raised questions about the competitiveness of lower-parameter, locally tunable, open-weight models relative to high-parameter, API-guarded, closed-weight models in terms of performance, domain adaptation, cost, and generalization.
no code implementations • 18 Dec 2023 • Lucas Rosenblatt, Julia Stoyanovich, Christopher Musco
Our theoretical results center on the private mean estimation problem, while our empirical results center on extensive experiments on private data synthesis to demonstrate the effectiveness of stratification on a variety of private mechanisms.
1 code implementation • 12 Dec 2023 • R. Teal Witter, Lucas Rosenblatt
In order to simulate the impact of opening streets, we first compare models for predicting vehicle collisions given network and temporal data.
1 code implementation • 1 Oct 2023 • Lucas Rosenblatt, Bin Han, Erin Posthumus, Theresa Crimmins, Bill Howe
An invasive species of grass known as "buffelgrass" contributes to severe wildfires and biodiversity loss in the Southwest United States.
no code implementations • 13 Feb 2023 • Andrew Bell, Lucius Bynum, Nazarii Drushchak, Tetiana Herasymova, Lucas Rosenblatt, Julia Stoyanovich
The ``impossibility theorem'' -- which is considered foundational in algorithmic fairness literature -- asserts that there must be trade-offs between common notions of fairness and performance when fitting statistical models, except in two special cases: when the prevalence of the outcome being predicted is equal across groups, or when a perfectly accurate predictor is used.
1 code implementation • 5 Jan 2023 • Lorena Piedras, Lucas Rosenblatt, Julia Wilkins
Detecting "toxic" language in internet content is a pressing social and technical challenge.
no code implementations • 7 Aug 2022 • Lucas Rosenblatt, R. Teal Witter
Making fair decisions is crucial to ethically implementing machine learning algorithms in social settings.
no code implementations • 27 Apr 2022 • Lucas Rosenblatt, Joshua Allen, Julia Stoyanovich
Our methods are based on the insights that feature importance can inform how privacy budget is allocated, and, further, that per-group feature importance and fairness-related performance objectives can be incorporated in the allocation.
1 code implementation • 11 Nov 2020 • Lucas Rosenblatt, Xiaoyan Liu, Samira Pouyanfar, Eduardo de Leon, Anuj Desai, Joshua Allen
Differentially private data synthesis protects personal details from exposure, and allows for the training of differentially private machine learning models on privately generated datasets.