1 code implementation • 3 Oct 2024 • Richard Zhuang, Tianhao Wu, Zhaojin Wen, Andrew Li, Jiantao Jiao, Kannan Ramchandran
Many existing methods repeatedly learn task-specific representations of Large Language Models (LLMs), which leads to inefficiencies in both time and computational resources.
no code implementations • 24 Sep 2024 • Ying Fan, Yilun Du, Kannan Ramchandran, Kangwook Lee
Recent work has shown that Transformers trained from scratch can successfully solve various arithmetic and algorithmic tasks, such as adding numbers and computing parity.
no code implementations • 25 Jul 2024 • Nived Rajaraman, Marco Bondaschi, Kannan Ramchandran, Michael Gastpar, Ashok Vardhan Makkuva
On the theoretical side, our main result is that a transformer with a single head and three layers can represent the in-context conditional empirical distribution for \kth Markov sources, concurring with our empirical observations.
no code implementations • 12 Apr 2024 • Nived Rajaraman, Jiantao Jiao, Kannan Ramchandran
In this paper, we investigate tokenization from a theoretical point of view by studying the behavior of transformers on simple data generating processes.
no code implementations • 4 Feb 2024 • Justin S. Kang, Yigit E. Erginbas, Landon Butler, Ramtin Pedarsani, Kannan Ramchandran
This marks the first $n$ sub-linear query complexity, noise-tolerant algorithm for the M\"obius transform.
no code implementations • 13 Dec 2023 • Baihe Huang, Hanlin Zhu, Banghua Zhu, Kannan Ramchandran, Michael I. Jordan, Jason D. Lee, Jiantao Jiao
Key to our formulation is a coupling of the output tokens and the rejection region, realized by pseudo-random generators in practice, that allows non-trivial trade-offs between the Type I error and Type II error.
no code implementations • 30 Sep 2023 • Tianhao Wu, Banghua Zhu, Ruoyu Zhang, Zhaojin Wen, Kannan Ramchandran, Jiantao Jiao
In summary, this work introduces a simpler yet effective approach for aligning LLMs to human preferences through relative feedback.
no code implementations • 26 Mar 2023 • Brett Levac, Ajil Jalal, Kannan Ramchandran, Jonathan I. Tamir
This leads to an improvement in image reconstruction fidelity over generative models that rely only on a marginal prior over the image contrast of interest.
no code implementations • 12 Feb 2023 • Nived Rajaraman, Yanjun Han, Jiantao Jiao, Kannan Ramchandran
We consider the sequential decision-making problem where the mean outcome is a non-linear function of the chosen action.
no code implementations • 30 Jan 2023 • Justin Kang, Ramtin Pedarsani, Kannan Ramchandran
We also formulate a heterogeneous federated learning problem for the platform with privacy level options for users.
1 code implementation • 15 Jan 2023 • Yigit Efe Erginbas, Justin Singh Kang, Amirali Aghazadeh, Kannan Ramchandran
Fourier transformations of pseudo-Boolean functions are popular tools for analyzing functions of binary sequences.
no code implementations • 13 Dec 2022 • Yigit Efe Erginbas, Soham Phade, Kannan Ramchandran
Large-scale online recommendation systems must facilitate the allocation of a limited number of items among competing users while learning their preferences from user feedback.
no code implementations • 5 Oct 2022 • Amirali Aghazadeh, Nived Rajaraman, Tony Tu, Kannan Ramchandran
Data-driven machine learning models are being increasingly employed in several important inference problems in biology, chemistry, and physics which require learning over combinatorial spaces.
no code implementations • 8 Jul 2022 • Yigit Efe Erginbas, Soham Phade, Kannan Ramchandran
Recommendation systems when employed in markets play a dual role: they assist users in selecting their most desired items from a large pool and they help in allocating a limited number of items to the users who desire them the most.
2 code implementations • 12 Jun 2022 • Zhengming Zhang, Ashwinee Panda, Linyue Song, Yaoqing Yang, Michael W. Mahoney, Joseph E. Gonzalez, Kannan Ramchandran, Prateek Mittal
In this type of attack, the goal of the attacker is to use poisoned updates to implant so-called backdoors into the learned model such that, at test time, the model's outputs can be fixed to a given target for certain inputs.
no code implementations • 31 May 2022 • Avishek Ghosh, Abishek Sankararaman, Kannan Ramchandran, Tara Javidi, Arya Mazumdar
We propose and analyze a decentralized and asynchronous learning algorithm, namely Decentralized Non-stationary Competing Bandits (\texttt{DNCB}), where the agents play (restrictive) successive elimination type learning algorithms to learn their preference over the arms.
1 code implementation • 30 May 2022 • Gokul Swamy, Nived Rajaraman, Matthew Peng, Sanjiban Choudhury, J. Andrew Bagnell, Zhiwei Steven Wu, Jiantao Jiao, Kannan Ramchandran
In the tabular setting or with linear function approximation, our meta theorem shows that the performance gap incurred by our approach achieves the optimal $\widetilde{O} \left( \min({H^{3/2}} / {N}, {H} / {\sqrt{N}} \right)$ dependency, under significantly weaker assumptions compared to prior work.
1 code implementation • 6 Feb 2022 • Yaoqing Yang, Ryan Theisen, Liam Hodgkinson, Joseph E. Gonzalez, Kannan Ramchandran, Charles H. Martin, Michael W. Mahoney
Our analyses consider (I) hundreds of Transformers trained in different settings, in which we systematically vary the amount of data, the model size and the optimization hyperparameters, (II) a total of 51 pretrained Transformers from eight families of Huggingface NLP models, including GPT2, BERT, etc., and (III) a total of 28 existing and novel generalization metrics.
no code implementations • NeurIPS 2021 • Nived Rajaraman, Yanjun Han, Lin Yang, Jingbo Liu, Jiantao Jiao, Kannan Ramchandran
In contrast, when the MDP transition structure is known to the learner such as in the case of simulators, we demonstrate fundamental differences compared to the tabular setting in terms of the performance of an optimal algorithm, Mimic-MD (Rajaraman et al. (2020)) when extended to the function approximation setting.
1 code implementation • NeurIPS 2021 • Yaoqing Yang, Liam Hodgkinson, Ryan Theisen, Joe Zou, Joseph E. Gonzalez, Kannan Ramchandran, Michael W. Mahoney
Viewing neural network models in terms of their loss landscapes has a long history in the statistical mechanics approach to learning, and in recent years it has received attention within machine learning proper.
no code implementations • 13 Jul 2021 • Avishek Ghosh, Sayak Ray Chowdhury, Kannan Ramchandran
We propose and analyze a novel algorithm, namely \emph{Adaptive Reinforcement Learning (General)} (\texttt{ARL-GEN}) that adapts to the smallest such family where the true transition kernel $P^*$ lies.
no code implementations • 7 Jul 2021 • Avishek Ghosh, Abishek Sankararaman, Kannan Ramchandran
We consider the problem of model selection for the general stochastic contextual bandits under the realizability assumption.
no code implementations • 15 Jun 2021 • Avishek Ghosh, Abishek Sankararaman, Kannan Ramchandran
We show that, for any agent, the regret scales as $\mathcal{O}(\sqrt{T/N})$, if the agent is in a `well separated' cluster, or scales as $\mathcal{O}(T^{\frac{1}{2} + \varepsilon}/(N)^{\frac{1}{2} -\varepsilon})$ if its cluster is not well separated, where $\varepsilon$ is positive and arbitrarily close to $0$.
no code implementations • 16 May 2021 • Vipul Gupta, Avishek Ghosh, Michal Derezinski, Rajiv Khanna, Kannan Ramchandran, Michael Mahoney
To enhance practicability, we devise an adaptive scheme to choose L, and we show that this reduces the number of local iterations in worker machines between two model synchronizations as the training proceeds, successively refining the model quality at the master.
no code implementations • 17 Mar 2021 • Avishek Ghosh, Raj Kumar Maity, Arya Mazumdar, Kannan Ramchandran
Moreover, we validate our theoretical findings with experiments using standard datasets and several types of Byzantine attacks, and obtain an improvement of $25\%$ with respect to first order methods in iteration complexity.
no code implementations • 25 Feb 2021 • Nived Rajaraman, Yanjun Han, Lin F. Yang, Kannan Ramchandran, Jiantao Jiao
We establish an upper bound $O(|\mathcal{S}|H^{3/2}/N)$ for the suboptimality using the Mimic-MD algorithm in Rajaraman et al (2020) which we prove to be computationally efficient.
1 code implementation • 26 Oct 2020 • Amirali Aghazadeh, Vipul Gupta, Alex DeWeese, O. Ozan Koyluoglu, Kannan Ramchandran
We consider feature selection for applications in machine learning where the dimensionality of the data is so large that it exceeds the working memory of the (local) computing machine.
no code implementations • 18 Oct 2020 • Vipul Gupta, Dhruv Choudhary, Ping Tak Peter Tang, Xiaohan Wei, Xing Wang, Yuzhen Huang, Arun Kejariwal, Kannan Ramchandran, Michael W. Mahoney
This is done by identifying and updating only the most relevant neurons of the neural network for each training sample in the data.
no code implementations • 23 Sep 2020 • Swanand Kadhe, Nived Rajaraman, O. Ozan Koyluoglu, Kannan Ramchandran
In this paper, we propose a secure aggregation protocol, FastSecAgg, that is efficient in terms of computation and communication, and robust to client dropouts.
1 code implementation • 18 Aug 2020 • Vipul Gupta, Soham Phade, Thomas Courtade, Kannan Ramchandran
As one of the fastest-growing cloud services, serverless computing provides an opportunity to better serve both users and providers through the incorporation of market-based strategies for pricing and resource allocation.
Distributed, Parallel, and Cluster Computing Computer Science and Game Theory
1 code implementation • NeurIPS 2020 • Yaoqing Yang, Rajiv Khanna, Yaodong Yu, Amir Gholami, Kurt Keutzer, Joseph E. Gonzalez, Kannan Ramchandran, Michael W. Mahoney
Using these observations, we show that noise-augmentation on mixup training further increases boundary thickness, thereby combating vulnerability to various forms of adversarial attacks and OOD transforms.
3 code implementations • NeurIPS 2020 • Avishek Ghosh, Jichan Chung, Dong Yin, Kannan Ramchandran
We address the problem of federated learning (FL) where users are distributed and partitioned into clusters.
no code implementations • 4 Jun 2020 • Avishek Ghosh, Abishek Sankararaman, Kannan Ramchandran
This is the first algorithm that achieves such model selection guarantees.
no code implementations • 14 May 2020 • Swanand Kadhe, O. Ozan Koyluoglu, Kannan Ramchandran
When a particular code is used in this framework, its block-length determines the computation load, dimension determines the communication overhead, and minimum distance determines the straggler tolerance.
no code implementations • 23 Apr 2020 • Avishek Ghosh, Kannan Ramchandran
Furthermore, we compare AM with a gradient based heuristic algorithm empirically and show that AM dominates in iteration complexity as well as wall-clock time.
1 code implementation • 21 Jan 2020 • Vipul Gupta, Dominic Carrano, Yaoqing Yang, Vaishaal Shankar, Thomas Courtade, Kannan Ramchandran
Inexpensive cloud services, such as serverless computing, are often vulnerable to straggling nodes that increase end-to-end latency for distributed computation.
Distributed, Parallel, and Cluster Computing Information Theory Information Theory
no code implementations • 21 Nov 2019 • Avishek Ghosh, Raj Kumar Maity, Swanand Kadhe, Arya Mazumdar, Kannan Ramchandran
Moreover, we analyze the compressed gradient descent algorithm with error feedback (proposed in \cite{errorfeed}) in a distributed setting and in the presence of Byzantine worker machines.
no code implementations • 28 Jun 2019 • Swanand Kadhe, Jichan Chung, Kannan Ramchandran
In this paper, we propose an architecture based on 'fountain codes', a class of erasure codes, that enables any full node to 'encode' validated blocks into a small number of 'coded blocks', thereby reducing its storage costs by orders of magnitude.
Cryptography and Security Distributed, Parallel, and Cluster Computing Information Theory Information Theory
no code implementations • 21 Jun 2019 • Avishek Ghosh, Ashwin Pananjady, Adityanand Guntuboyina, Kannan Ramchandran
Max-affine regression refers to a model where the unknown regression function is modeled as a maximum of $k$ unknown affine functions for a fixed $k \geq 1$.
no code implementations • 16 Jun 2019 • Avishek Ghosh, Justin Hong, Dong Yin, Kannan Ramchandran
Then, leveraging the statistical model, we solve the robust heterogeneous Federated Learning problem \emph{optimally}; in particular our algorithm matches the lower bound on the estimation error in dimension and the number of data points.
no code implementations • 9 May 2019 • Orhan Ocal, Oguz H. Elibol, Gokce Keskin, Cory Stephenson, Anil Thomas, Kannan Ramchandran
Due to the use of a single encoder, our method can generalize to converting the voice of out-of-training speakers to speakers in the training dataset.
no code implementations • ICLR 2019 • Kamil Nar, Orhan Ocal, S. Shankar Sastry, Kannan Ramchandran
In this work, we study the binary classification of linearly separable datasets and show that linear classifiers could also have decision boundaries that lie close to their training dataset if cross-entropy loss is used for training.
no code implementations • 30 Apr 2019 • Swanand Kadhe, O. Ozan Koyluoglu, Kannan Ramchandran
In this work, our goal is to construct approximate gradient codes that are resilient to stragglers selected by a computationally unbounded adversary.
1 code implementation • 21 Mar 2019 • Vipul Gupta, Swanand Kadhe, Thomas Courtade, Michael W. Mahoney, Kannan Ramchandran
Motivated by recent developments in serverless systems for large-scale computation as well as improvements in scalable randomized matrix algorithms, we develop OverSketched Newton, a randomized Hessian-based optimization algorithm to solve large-scale convex optimization problems in serverless systems.
no code implementations • 24 Jan 2019 • Kamil Nar, Orhan Ocal, S. Shankar Sastry, Kannan Ramchandran
We show that differential training can ensure a large margin between the decision boundary of the neural network and the points in the training dataset.
1 code implementation • 6 Nov 2018 • Vipul Gupta, Shusen Wang, Thomas Courtade, Kannan Ramchandran
We propose OverSketch, an approximate algorithm for distributed matrix multiplication in serverless computing.
Distributed, Parallel, and Cluster Computing Information Theory Information Theory
2 code implementations • 6 Nov 2018 • Gary Cheng, Armin Askari, Kannan Ramchandran, Laurent El Ghaoui
In this paper, we consider the problem of selecting representatives from a data set for arbitrary supervised/unsupervised learning tasks.
1 code implementation • 29 Oct 2018 • Dong Yin, Kannan Ramchandran, Peter Bartlett
For binary linear classifiers, we prove tight bounds for the adversarial Rademacher complexity, and show that the adversarial Rademacher complexity is never smaller than its natural counterpart, and it has an unavoidable dimension dependence, unless the weight vector has bounded $\ell_1$ norm.
no code implementations • 9 Jul 2018 • Avishek Ghosh, Kannan Ramchandran
We argue that the error in the score estimate accumulated over $T$ iterations is small if the regret of the online convex game is small.
no code implementations • 14 Jun 2018 • Dong Yin, Yudong Chen, Kannan Ramchandran, Peter Bartlett
In this setting, the Byzantine machines may create fake local minima near a saddle point that is far away from any true local minimum, even when robust gradient estimators are used.
2 code implementations • ICML 2018 • Dong Yin, Yudong Chen, Kannan Ramchandran, Peter Bartlett
In particular, these algorithms are shown to achieve order-optimal statistical error rates for strongly convex losses.
no code implementations • 4 Jan 2018 • Reinhard Heckel, Max Simchowitz, Kannan Ramchandran, Martin J. Wainwright
Accordingly, we study the problem of finding approximate rankings from pairwise comparisons.
no code implementations • 24 Oct 2017 • Jingge Zhu, Ye Pu, Vipul Gupta, Claire Tomlin, Kannan Ramchandran
As an application of the results, we demonstrate solving optimization problems using a sequential approximation approach, which accelerates the algorithm in a distributed system with stragglers.
no code implementations • 18 Jun 2017 • Dong Yin, Ashwin Pananjady, Max Lam, Dimitris Papailiopoulos, Kannan Ramchandran, Peter Bartlett
It has been experimentally observed that distributed implementations of mini-batch stochastic gradient descent (SGD) algorithms exhibit speedup saturation and decaying generalization ability beyond a particular batch-size.
1 code implementation • ICML 2017 • Reinhard Heckel, Kannan Ramchandran
We consider the online one-class collaborative filtering (CF) problem that consists of recommending items to users over time in an online fashion based on positive ratings only.
no code implementations • 28 Jun 2016 • Reinhard Heckel, Nihar B. Shah, Kannan Ramchandran, Martin J. Wainwright
We first analyze a sequential ranking algorithm that counts the number of comparisons won, and uses these counts to decide whether to stop, or to compare another pair of items, chosen based on confidence intervals specified by the data collected up to that point.
1 code implementation • NeurIPS 2016 • Xinghao Pan, Maximilian Lam, Stephen Tu, Dimitris Papailiopoulos, Ce Zhang, Michael. I. Jordan, Kannan Ramchandran, Chris Re, Benjamin Recht
We present CYCLADES, a general framework for parallelizing stochastic optimization algorithms in a shared memory setting.
no code implementations • 8 Dec 2015 • Kangwook Lee, Maximilian Lam, Ramtin Pedarsani, Dimitris Papailiopoulos, Kannan Ramchandran
We focus on two of the most basic building blocks of distributed learning algorithms: matrix multiplication and data shuffling.
no code implementations • NeurIPS 2015 • Xiao Li, Kannan Ramchandran
By writing the cut function as a polynomial and exploiting the graph structure, we propose a sketching algorithm to learn the an arbitrary $n$-node unknown graph using only few cut queries, which scales {\it almost linearly} in the number of edges and {\it sub-linearly} in the graph size $n$.
no code implementations • 19 Sep 2015 • Frank Ong, Sameer Pawar, Kannan Ramchandran
For the case when the spatial-domain measurements are corrupted by additive noise, our 2D-FFAST framework extends to a noise-robust version in sub-linear time of O(k log4 N ) using O(k log3 N ) measurements.
Information Theory Multimedia Systems and Control Information Theory
3 code implementations • 26 Aug 2015 • Xiao Li, Joseph K. Bradley, Sameer Pawar, Kannan Ramchandran
We consider the problem of computing the Walsh-Hadamard Transform (WHT) of some $N$-length input vector in the presence of noise, where the $N$-point Walsh spectrum is $K$-sparse with $K = {O}(N^{\delta})$ scaling sub-linearly in the input dimension $N$ for some $0<\delta<1$.
no code implementations • 24 Jul 2015 • Horia Mania, Xinghao Pan, Dimitris Papailiopoulos, Benjamin Recht, Kannan Ramchandran, Michael. I. Jordan
We demonstrate experimentally on a 16-core machine that the sparse and parallel version of SVRG is in some cases more than four orders of magnitude faster than the standard SVRG algorithm.
no code implementations • NeurIPS 2015 • Xinghao Pan, Dimitris Papailiopoulos, Samet Oymak, Benjamin Recht, Kannan Ramchandran, Michael. I. Jordan
We present C4 and ClusterWild!, two algorithms for parallel correlation clustering that run in a polylogarithmic number of rounds and achieve nearly linear speedups, provably.
no code implementations • 6 May 2015 • Nihar B. Shah, Sivaraman Balakrishnan, Joseph Bradley, Abhay Parekh, Kannan Ramchandran, Martin J. Wainwright
Data in the form of pairwise comparisons arises in many domains, including preference elicitation, sporting competitions, and peer grading among others.
no code implementations • 1 Jan 2015 • Sameer Pawar, Kannan Ramchandran
If the DFT X of the signal x has only k non-zero coefficients (where k < n), can we do better?
no code implementations • 25 Jun 2014 • Nihar B. Shah, Sivaraman Balakrishnan, Joseph Bradley, Abhay Parekh, Kannan Ramchandran, Martin Wainwright
When eliciting judgements from humans for an unknown quantity, one often has the choice of making direct-scoring (cardinal) or comparative (ordinal) measurements.