Search Results for author: Parameswaran Raman

Found 12 papers, 1 papers with code

HLAT: High-quality Large Language Model Pre-trained on AWS Trainium

no code implementations16 Apr 2024 Haozheng Fan, Hao Zhou, Guangtai Huang, Parameswaran Raman, Xinwei Fu, Gaurav Gupta, Dhananjay Ram, Yida Wang, Jun Huan

In this paper, we showcase HLAT: a 7 billion parameter decoder-only LLM pre-trained using trn1 instances over 1. 8 trillion tokens.

Language Modelling Large Language Model

EMC$^2$: Efficient MCMC Negative Sampling for Contrastive Learning with Global Convergence

no code implementations16 Apr 2024 Chung-Yiu Yau, Hoi-To Wai, Parameswaran Raman, Soumajyoti Sarkar, Mingyi Hong

We follow the global contrastive learning loss as introduced in SogCLR, and propose EMC$^2$ which utilizes an adaptive Metropolis-Hastings subroutine to generate hardness-aware negative samples in an online fashion during the optimization.

Contrastive Learning

Variance-reduced Zeroth-Order Methods for Fine-Tuning Language Models

no code implementations11 Apr 2024 Tanmay Gautam, Youngsuk Park, Hao Zhou, Parameswaran Raman, Wooseok Ha

Evaluated across a range of both masked and autoregressive LMs on benchmark GLUE tasks, MeZO-SVRG outperforms MeZO with up to 20% increase in test accuracies in both full- and partial-parameter fine-tuning settings.

In-Context Learning

MADA: Meta-Adaptive Optimizers through hyper-gradient Descent

no code implementations17 Jan 2024 Kaan Ozkara, Can Karakus, Parameswaran Raman, Mingyi Hong, Shoham Sabach, Branislav Kveton, Volkan Cevher

Since Adam was introduced, several novel adaptive optimizers for deep learning have been proposed.

Krylov Cubic Regularized Newton: A Subspace Second-Order Method with Dimension-Free Convergence Rate

no code implementations5 Jan 2024 Ruichen Jiang, Parameswaran Raman, Shoham Sabach, Aryan Mokhtari, Mingyi Hong, Volkan Cevher

In this paper, we introduce a novel subspace cubic regularized Newton method that achieves a dimension-independent global convergence rate of ${O}\left(\frac{1}{mk}+\frac{1}{k^2}\right)$ for solving convex optimization problems.

Second-order methods

Contractive error feedback for gradient compression

no code implementations13 Dec 2023 Bingcong Li, Shuai Zheng, Parameswaran Raman, Anshumali Shrivastava, Georgios B. Giannakis

On-device memory concerns in distributed deep learning have become severe due to (i) the growth of model size in multi-GPU training, and (ii) the wide adoption of deep neural networks for federated learning on IoT devices which have limited storage.

Federated Learning Image Classification +2

DS-FACTO: Doubly Separable Factorization Machines

no code implementations29 Apr 2020 Parameswaran Raman, S. V. N. Vishwanathan

Traditional algorithms for FM which work on a single-machine are not equipped to handle this scale and therefore, using a distributed algorithm to parallelize the computation across a cluster is inevitable.

Recommendation Systems Stochastic Optimization

Optimization on the Surface of the (Hyper)-Sphere

no code implementations13 Sep 2019 Parameswaran Raman, Jiasen Yang

Thomson problem is a classical problem in physics to study how $n$ number of charged particles distribute themselves on the surface of a sphere of $k$ dimensions.

Extreme Stochastic Variational Inference: Distributed and Asynchronous

no code implementations31 May 2016 Jiong Zhang, Parameswaran Raman, Shihao Ji, Hsiang-Fu Yu, S. V. N. Vishwanathan, Inderjit S. Dhillon

Moreover, it requires the parameters to fit in the memory of a single processor; this is problematic when the number of parameters is in billions.

Variational Inference

Ranking via Robust Binary Classification

no code implementations NeurIPS 2014 Hyokun Yun, Parameswaran Raman, S. Vishwanathan

We propose RoBiRank, a ranking algorithm that is motivated by observing a close connection between evaluation metrics for learning to rank and loss functions for robust classification.

Binary Classification Classification +3

Ranking via Robust Binary Classification and Parallel Parameter Estimation in Large-Scale Data

no code implementations11 Feb 2014 Hyokun Yun, Parameswaran Raman, S. V. N. Vishwanathan

We propose RoBiRank, a ranking algorithm that is motivated by observing a close connection between evaluation metrics for learning to rank and loss functions for robust classification.

Binary Classification General Classification +2

Cannot find the paper you are looking for? You can Submit a new open access paper.