1 code implementation • NeurIPS 2023 • Ravid Shwartz-Ziv, Micah Goldblum, Yucen Lily Li, C. Bayan Bruss, Andrew Gordon Wilson
Real-world datasets are often highly class-imbalanced, which can adversely impact the performance of deep learning models.
no code implementations • 13 Sep 2023 • Angelica Chen, Ravid Shwartz-Ziv, Kyunghyun Cho, Matthew L. Leavitt, Naomi Saphra
Most interpretability research in NLP focuses on understanding the behavior and features of a fully trained model.
no code implementations • 23 Jun 2023 • Jiachen Zhu, Katrina Evtimova, Yubei Chen, Ravid Shwartz-Ziv, Yann Lecun
In summary, VCReg offers a universally applicable regularization framework that significantly advances transfer learning and highlights the connection between gradient starvation, neural collapse, and feature transferability.
1 code implementation • NeurIPS 2023 • Ido Ben-Shaul, Ravid Shwartz-Ziv, Tomer Galanti, Shai Dekel, Yann Lecun
Self-supervised learning (SSL) is a powerful tool in machine learning, but understanding the learned representations and their underlying mechanisms remains a challenge.
no code implementations • 19 Apr 2023 • Ravid Shwartz-Ziv, Yann Lecun
Information theory, and notably the information bottleneck principle, has been pivotal in shaping deep neural networks.
no code implementations • 1 Mar 2023 • Ravid Shwartz-Ziv, Randall Balestriero, Kenji Kawaguchi, Tim G. J. Rudner, Yann Lecun
In this paper, we provide an information-theoretic perspective on Variance-Invariance-Covariance Regularization (VICReg) for self-supervised learning.
1 code implementation • 12 Oct 2022 • Jonas Geiping, Micah Goldblum, Gowthami Somepalli, Ravid Shwartz-Ziv, Tom Goldstein, Andrew Gordon Wilson
Despite the clear performance benefits of data augmentations, little is known about why they are so effective.
no code implementations • 20 Jul 2022 • Ravid Shwartz-Ziv, Randall Balestriero, Yann Lecun
In this paper, we examine self-supervised learning methods, particularly VICReg, to provide an information-theoretical understanding of their construction.
1 code implementation • 20 May 2022 • Ravid Shwartz-Ziv, Micah Goldblum, Hossein Souri, Sanyam Kapoor, Chen Zhu, Yann Lecun, Andrew Gordon Wilson
Deep learning is increasingly moving towards a transfer learning paradigm whereby large foundation models are fine-tuned on downstream tasks, starting from an initialization learned on the source task.
no code implementations • 10 Feb 2022 • Ravid Shwartz-Ziv
Then, we propose using the Information Bottleneck (IB) theory to explain deep learning systems.
1 code implementation • ICML Workshop AutoML 2021 • Ravid Shwartz-Ziv, Amitai Armon
A key element in solving real-life data science problems is selecting the types of models to use.
no code implementations • 27 Dec 2020 • Ravid Shwartz-Ziv, Itamar Ben Ari, Amitai Armon
In this work we present a spatial-temporal convolutional neural network for predicting future COVID-19 related symptoms severity among a population, per region, given its past reported symptoms.
1 code implementation • 8 Jun 2020 • Zoe Piran, Ravid Shwartz-Ziv, Naftali Tishby
The Information Bottleneck (IB) framework is a general characterization of optimal representations obtained using a principled approach for balancing accuracy and complexity.
1 code implementation • pproximateinference AABI Symposium 2019 • Ravid Shwartz-Ziv, Alexander A. Alemi
In this preliminary work, we study the generalization properties of infinite ensembles of infinitely-wide neural networks.
no code implementations • ICLR 2019 • Ravid Shwartz-Ziv, Amichai Painsky, Naftali Tishby
Specifically, we show that the training of the network is characterized by a rapid increase in the mutual information (MI) between the layers and the target label, followed by a longer decrease in the MI between the layers and the input variable.
no code implementations • 26 Nov 2018 • Itamar Ben-Ari, Ravid Shwartz-Ziv
Our model is shown to be effective in detecting anomalies in videos.
13 code implementations • 2 Mar 2017 • Ravid Shwartz-Ziv, Naftali Tishby
Previous work proposed to analyze DNNs in the \textit{Information Plane}; i. e., the plane of the Mutual Information values that each layer preserves on the input and output variables.