Search Results for author: Parameswaran Raman

Found 12 papers, 1 papers with code

HLAT: High-quality Large Language Model Pre-trained on AWS Trainium

no code implementations • 16 Apr 2024 • Haozheng Fan, Hao Zhou, Guangtai Huang, Parameswaran Raman, Xinwei Fu, Gaurav Gupta, Dhananjay Ram, Yida Wang, Jun Huan

In this paper, we showcase HLAT: a 7 billion parameter decoder-only LLM pre-trained using trn1 instances over 1. 8 trillion tokens.

Language Modelling Large Language Model

Paper
Add Code

EMC$^2$: Efficient MCMC Negative Sampling for Contrastive Learning with Global Convergence

no code implementations • 16 Apr 2024 • Chung-Yiu Yau, Hoi-To Wai, Parameswaran Raman, Soumajyoti Sarkar, Mingyi Hong

We follow the global contrastive learning loss as introduced in SogCLR, and propose EMC$^2$ which utilizes an adaptive Metropolis-Hastings subroutine to generate hardness-aware negative samples in an online fashion during the optimization.

Contrastive Learning

Paper
Add Code

Variance-reduced Zeroth-Order Methods for Fine-Tuning Language Models

no code implementations • 11 Apr 2024 • Tanmay Gautam, Youngsuk Park, Hao Zhou, Parameswaran Raman, Wooseok Ha

Evaluated across a range of both masked and autoregressive LMs on benchmark GLUE tasks, MeZO-SVRG outperforms MeZO with up to 20% increase in test accuracies in both full- and partial-parameter fine-tuning settings.

In-Context Learning

Paper
Add Code

MADA: Meta-Adaptive Optimizers through hyper-gradient Descent

no code implementations • 17 Jan 2024 • Kaan Ozkara, Can Karakus, Parameswaran Raman, Mingyi Hong, Shoham Sabach, Branislav Kveton, Volkan Cevher

Since Adam was introduced, several novel adaptive optimizers for deep learning have been proposed.

Paper
Add Code

Krylov Cubic Regularized Newton: A Subspace Second-Order Method with Dimension-Free Convergence Rate

no code implementations • 5 Jan 2024 • Ruichen Jiang, Parameswaran Raman, Shoham Sabach, Aryan Mokhtari, Mingyi Hong, Volkan Cevher

In this paper, we introduce a novel subspace cubic regularized Newton method that achieves a dimension-independent global convergence rate of ${O}\left(\frac{1}{mk}+\frac{1}{k^2}\right)$ for solving convex optimization problems.

Second-order methods

Paper
Add Code

Contractive error feedback for gradient compression

no code implementations • 13 Dec 2023 • Bingcong Li, Shuai Zheng, Parameswaran Raman, Anshumali Shrivastava, Georgios B. Giannakis

On-device memory concerns in distributed deep learning have become severe due to (i) the growth of model size in multi-GPU training, and (ii) the wide adoption of deep neural networks for federated learning on IoT devices which have limited storage.

Federated Learning Image Classification +2

Paper
Add Code

DS-FACTO: Doubly Separable Factorization Machines

no code implementations • 29 Apr 2020 • Parameswaran Raman, S. V. N. Vishwanathan

Traditional algorithms for FM which work on a single-machine are not equipped to handle this scale and therefore, using a distributed algorithm to parallelize the computation across a cluster is inevitable.

Recommendation Systems Stochastic Optimization

Paper
Add Code

Optimization on the Surface of the (Hyper)-Sphere

no code implementations • 13 Sep 2019 • Parameswaran Raman, Jiasen Yang

Thomson problem is a classical problem in physics to study how $n$ number of charged particles distribute themselves on the surface of a sphere of $k$ dimensions.

Paper
Add Code

Extreme Stochastic Variational Inference: Distributed and Asynchronous

no code implementations • 31 May 2016 • Jiong Zhang, Parameswaran Raman, Shihao Ji, Hsiang-Fu Yu, S. V. N. Vishwanathan, Inderjit S. Dhillon

Moreover, it requires the parameters to fit in the memory of a single processor; this is problematic when the number of parameters is in billions.

Variational Inference

Paper
Add Code

DS-MLR: Exploiting Double Separability for Scaling up Distributed Multinomial Logistic Regression

1 code implementation • 16 Apr 2016 • Parameswaran Raman, Sriram Srinivasan, Shin Matsushima, Xinhua Zhang, Hyokun Yun, S. V. N. Vishwanathan

Scaling multinomial logistic regression to datasets with very large number of data points and classes is challenging.

Blocking Multi-class Classification +1

Paper
Code

Ranking via Robust Binary Classification

no code implementations • NeurIPS 2014 • Hyokun Yun, Parameswaran Raman, S. Vishwanathan

We propose RoBiRank, a ranking algorithm that is motivated by observing a close connection between evaluation metrics for learning to rank and loss functions for robust classification.

Binary Classification Classification +3

Paper
Add Code

Ranking via Robust Binary Classification and Parallel Parameter Estimation in Large-Scale Data

no code implementations • 11 Feb 2014 • Hyokun Yun, Parameswaran Raman, S. V. N. Vishwanathan

We propose RoBiRank, a ranking algorithm that is motivated by observing a close connection between evaluation metrics for learning to rank and loss functions for robust classification.

Binary Classification General Classification +2

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.