1 code implementation • 6 Feb 2024 • Ashok Vardhan Makkuva, Marco Bondaschi, Adway Girish, Alliot Nagle, Martin Jaggi, Hyeji Kim, Michael Gastpar
Inspired by the Markovianity of natural languages, we model the data as a Markovian source and utilize this framework to systematically study the interplay between the data-distributional properties, the transformer architecture, the learnt distribution, and the final model performance.
no code implementations • 19 Oct 2023 • Ashok Vardhan Makkuva, Marco Bondaschi, Thijs Vogels, Martin Jaggi, Hyeji Kim, Michael C. Gastpar
On the latter, we obtain $50$-$64 \%$ improvement in perplexity over our baselines for noisy channels.
no code implementations • 16 Jan 2023 • Mohammad Vahid Jamali, Xiyang Liu, Ashok Vardhan Makkuva, Hessam Mahdavifar, Sewoong Oh, Pramod Viswanath
Next, we derive the soft-decision based version of our algorithm, called soft-subRPA, that not only improves upon the performance of subRPA but also enables a differentiable decoding algorithm.
1 code implementation • 1 Oct 2022 • S Ashwin Hebbar, Viraj Nadkarni, Ashok Vardhan Makkuva, Suma Bhat, Sewoong Oh, Pramod Viswanath
We design a principled curriculum, guided by information-theoretic insights, to train CRISP and show that it outperforms the successive-cancellation (SC) decoder and attains near-optimal reliability performance on the Polar(32, 16) and Polar(64, 22) codes.
1 code implementation • 29 Aug 2021 • Ashok Vardhan Makkuva, Xiyang Liu, Mohammad Vahid Jamali, Hessam Mahdavifar, Sewoong Oh, Pramod Viswanath
In this paper, we construct KO codes, a computationaly efficient family of deep-learning driven (encoder, decoder) pairs that outperform the state-of-the-art reliability performance on the standardized AWGN channel.
no code implementations • 2 Feb 2021 • Mohammad Vahid Jamali, Xiyang Liu, Ashok Vardhan Makkuva, Hessam Mahdavifar, Sewoong Oh, Pramod Viswanath
To lower the complexity of our decoding algorithm, referred to as subRPA in this paper, we investigate different ways for pruning the projections.
Information Theory Information Theory
no code implementations • 25 Sep 2019 • Anwesa Choudhuri, Ashok Vardhan Makkuva, Ranvir Rana, Sewoong Oh, Girish Chowdhary, Alexander Schwing
%In fact, contrastive disentanglement and unsupervised recovery are often combined in that we seek additional variations that exhibit salient factors/properties.
2 code implementations • ICML 2020 • Ashok Vardhan Makkuva, Amirhossein Taghvaei, Sewoong Oh, Jason D. Lee
Building upon recent advances in the field of input convex neural networks, we propose a new framework where the gradient of one convex function represents the optimal transport mapping.
no code implementations • 6 Jun 2019 • Ashok Vardhan Makkuva, Sewoong Oh, Sreeram Kannan, Pramod Viswanath
Gating is a key feature in modern neural networks including LSTMs, GRUs and sparsely-gated deep neural networks.
no code implementations • 9 Oct 2018 • Weihao Gao, Ashok Vardhan Makkuva, Sewoong Oh, Pramod Viswanath
Significant advances have been made recently on training neural networks, where the main challenge is in solving an optimization problem with abundant critical points.
no code implementations • 21 Feb 2018 • Ashok Vardhan Makkuva, Sewoong Oh, Sreeram Kannan, Pramod Viswanath
Once the experts are known, the recovery of gating parameters still requires an EM algorithm; however, we show that the EM algorithm for this simplified problem, unlike the joint EM algorithm, converges to the true parameters.