no code implementations • 24 Sep 2023 • Michael Gastpar, Ido Nachum, Jonathan Shafer, Thomas Weinberger
We study the notion of a generalization bound being uniformly tight, meaning that the difference between the bound and the population loss is small for all learning algorithms and all population distributions.
no code implementations • 27 Jun 2022 • Aditya Pradeep, Ido Nachum, Michael Gastpar
We prove that every online learnable class of functions of Littlestone dimension $d$ admits a learning algorithm with finite information complexity.
no code implementations • 3 Nov 2021 • Ido Nachum, Jan Hązła, Michael Gastpar, Anatoly Khina
The celebrated Johnson--Lindenstrauss lemma answers this question for linear fully-connected neural networks (FNNs), stating that the geometry is essentially preserved.
no code implementations • 3 Nov 2021 • Elisabetta Cornacchia, Jan Hązła, Ido Nachum, Amir Yehudayoff
We study the implicit bias of ReLU neural networks trained by a variant of SGD where at each step, the label is changed with probability $p$ to a random label (label smoothing being a close variant of this procedure).
no code implementations • ICLR 2022 • Ido Nachum, Jan Hazla, Michael Gastpar, Anatoly Khina
The celebrated Johnson-Lindenstrauss lemma answers this question for linear fully-connected neural networks (FNNs), stating that the geometry is essentially preserved.
no code implementations • 1 Jul 2019 • Ido Nachum, Amir Yehudayoff
This work provides an additional step in the theoretical understanding of neural networks.
no code implementations • 25 Nov 2018 • Ido Nachum, Amir Yehudayoff
Can it be that all concepts in the class require leaking a large amount of information?
no code implementations • 14 Jun 2018 • Shay Moran, Ido Nachum, Itai Panasoff, Amir Yehudayoff
We study and provide exposition to several phenomena that are related to the perceptron's compression.
no code implementations • 16 Apr 2018 • Ido Nachum, Jonathan Shafer, Amir Yehudayoff
We introduce a class of functions of VC dimension $d$ over the domain $\mathcal{X}$ with information complexity at least $\Omega\left(d\log \log \frac{|\mathcal{X}|}{d}\right)$ bits for any consistent and proper algorithm (deterministic or random).
no code implementations • 14 Oct 2017 • Raef Bassily, Shay Moran, Ido Nachum, Jonathan Shafer, Amir Yehudayoff
We discuss an approach that allows us to prove upper bounds on the amount of information that algorithms reveal about their inputs, and also provide a lower bound by showing a simple concept class for which every (possibly randomized) empirical risk minimizer must reveal a lot of information.