Search Results for author: Alliot Nagle

Found 5 papers, 4 papers with code

Attention with Markov: A Framework for Principled Analysis of Transformers via Markov Chains

1 code implementation • 6 Feb 2024 • Ashok Vardhan Makkuva, Marco Bondaschi, Adway Girish, Alliot Nagle, Martin Jaggi, Hyeji Kim, Michael Gastpar

Inspired by the Markovianity of natural languages, we model the data as a Markovian source and utilize this framework to systematically study the interplay between the data-distributional properties, the transformer architecture, the learnt distribution, and the final model performance.

Paper
Code

Rare Gems: Finding Lottery Tickets at Initialization

1 code implementation • 24 Feb 2022 • Kartik Sreenivasan, Jy-yong Sohn, Liu Yang, Matthew Grinde, Alliot Nagle, Hongyi Wang, Eric Xing, Kangwook Lee, Dimitris Papailiopoulos

Frankle & Carbin conjecture that we can avoid this by training "lottery tickets", i. e., special sparse subnetworks found at initialization, that can be trained to high accuracy.

Paper
Code

Neural Distributed Source Coding

no code implementations • 5 Jun 2021 • Jay Whang, Alliot Nagle, Anish Acharya, Hyeji Kim, Alexandros G. Dimakis

Distributed source coding (DSC) is the task of encoding an input in the absence of correlated side information that is only available to the decoder.

Paper
Add Code

Optimal Lottery Tickets via Subset Sum: Logarithmic Over-Parameterization is Sufficient

1 code implementation • NeurIPS 2020 • Ankit Pensia, Shashank Rajput, Alliot Nagle, Harit Vishwakarma, Dimitris Papailiopoulos

We show that any target network of width $d$ and depth $l$ can be approximated by pruning a random network that is a factor $O(log(dl))$ wider and twice as deep.

Paper
Code

Optimal Lottery Tickets via SubsetSum: Logarithmic Over-Parameterization is Sufficient

1 code implementation • 14 Jun 2020 • Ankit Pensia, Shashank Rajput, Alliot Nagle, Harit Vishwakarma, Dimitris Papailiopoulos

We show that any target network of width $d$ and depth $l$ can be approximated by pruning a random network that is a factor $O(\log(dl))$ wider and twice as deep.

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.