no code implementations • 16 Apr 2024 • Haozheng Fan, Hao Zhou, Guangtai Huang, Parameswaran Raman, Xinwei Fu, Gaurav Gupta, Dhananjay Ram, Yida Wang, Jun Huan
In this paper, we showcase HLAT: a 7 billion parameter decoder-only LLM pre-trained using trn1 instances over 1. 8 trillion tokens.
no code implementations • 16 Apr 2024 • Chung-Yiu Yau, Hoi-To Wai, Parameswaran Raman, Soumajyoti Sarkar, Mingyi Hong
We follow the global contrastive learning loss as introduced in SogCLR, and propose EMC$^2$ which utilizes an adaptive Metropolis-Hastings subroutine to generate hardness-aware negative samples in an online fashion during the optimization.
no code implementations • 11 Apr 2024 • Tanmay Gautam, Youngsuk Park, Hao Zhou, Parameswaran Raman, Wooseok Ha
Evaluated across a range of both masked and autoregressive LMs on benchmark GLUE tasks, MeZO-SVRG outperforms MeZO with up to 20% increase in test accuracies in both full- and partial-parameter fine-tuning settings.
no code implementations • 17 Jan 2024 • Kaan Ozkara, Can Karakus, Parameswaran Raman, Mingyi Hong, Shoham Sabach, Branislav Kveton, Volkan Cevher
Since Adam was introduced, several novel adaptive optimizers for deep learning have been proposed.
no code implementations • 5 Jan 2024 • Ruichen Jiang, Parameswaran Raman, Shoham Sabach, Aryan Mokhtari, Mingyi Hong, Volkan Cevher
In this paper, we introduce a novel subspace cubic regularized Newton method that achieves a dimension-independent global convergence rate of ${O}\left(\frac{1}{mk}+\frac{1}{k^2}\right)$ for solving convex optimization problems.
no code implementations • 13 Dec 2023 • Bingcong Li, Shuai Zheng, Parameswaran Raman, Anshumali Shrivastava, Georgios B. Giannakis
On-device memory concerns in distributed deep learning have become severe due to (i) the growth of model size in multi-GPU training, and (ii) the wide adoption of deep neural networks for federated learning on IoT devices which have limited storage.
no code implementations • 29 Apr 2020 • Parameswaran Raman, S. V. N. Vishwanathan
Traditional algorithms for FM which work on a single-machine are not equipped to handle this scale and therefore, using a distributed algorithm to parallelize the computation across a cluster is inevitable.
no code implementations • 13 Sep 2019 • Parameswaran Raman, Jiasen Yang
Thomson problem is a classical problem in physics to study how $n$ number of charged particles distribute themselves on the surface of a sphere of $k$ dimensions.
no code implementations • 31 May 2016 • Jiong Zhang, Parameswaran Raman, Shihao Ji, Hsiang-Fu Yu, S. V. N. Vishwanathan, Inderjit S. Dhillon
Moreover, it requires the parameters to fit in the memory of a single processor; this is problematic when the number of parameters is in billions.
1 code implementation • 16 Apr 2016 • Parameswaran Raman, Sriram Srinivasan, Shin Matsushima, Xinhua Zhang, Hyokun Yun, S. V. N. Vishwanathan
Scaling multinomial logistic regression to datasets with very large number of data points and classes is challenging.
no code implementations • NeurIPS 2014 • Hyokun Yun, Parameswaran Raman, S. Vishwanathan
We propose RoBiRank, a ranking algorithm that is motivated by observing a close connection between evaluation metrics for learning to rank and loss functions for robust classification.
no code implementations • 11 Feb 2014 • Hyokun Yun, Parameswaran Raman, S. V. N. Vishwanathan
We propose RoBiRank, a ranking algorithm that is motivated by observing a close connection between evaluation metrics for learning to rank and loss functions for robust classification.