Search Results for author: Matthew Stallone

Found 2 papers, 0 papers with code

Power Scheduler: A Batch Size and Token Number Agnostic Learning Rate Scheduler

no code implementations23 Aug 2024 Yikang Shen, Matthew Stallone, Mayank Mishra, Gaoyuan Zhang, Shawn Tan, Aditya Prasad, Adriana Meza Soria, David D. Cox, Rameswar Panda

This is not only because there is a complicated correlation between learning rate, batch size, number of training tokens, model size, and other hyperparameters but also because it is prohibitively expensive to perform a hyperparameter search for large language models with Billions or Trillions of parameters.

Diversity Measurement and Subset Selection for Instruction Tuning Datasets

no code implementations4 Feb 2024 Peiqi Wang, Yikang Shen, Zhen Guo, Matthew Stallone, Yoon Kim, Polina Golland, Rameswar Panda

Our experiments demonstrate that the proposed diversity measure in the normalized weight gradient space is correlated with downstream instruction-following performance.

Diversity Instruction Following +1

Cannot find the paper you are looking for? You can Submit a new open access paper.