no code implementations • 4 Feb 2025 • Dylan Sam, Ayan Chakrabarti, Afshin Rostamizadeh, Srikumar Ramalingam, Gui Citovsky, Sanjiv Kumar
We analyze a variety of embedding models in our framework, with experiments using the Pile dataset for pretraining a 1. 7B parameter decoder-only language model.
no code implementations • 29 May 2024 • Matthew Fahrbach, Srikumar Ramalingam, Morteza Zadimoghaddam, Sara Ahmadian, Gui Citovsky, Giulia Desalvo
We introduce a novel subset selection problem called min-distance diversification with monotone submodular utility ($\textsf{MDMS}$), which has a wide variety of applications in machine learning, e. g., data sampling and feature selection.
no code implementations • 24 Jan 2024 • Ke Ye, Heinrich Jiang, Afshin Rostamizadeh, Ayan Chakrabarti, Giulia Desalvo, Jean-François Kagy, Lazaros Karydas, Gui Citovsky, Sanjiv Kumar
In this paper, we present SpacTor, a new training procedure consisting of (1) a hybrid objective combining span corruption (SC) and token replacement detection (RTD), and (2) a two-stage curriculum that optimizes the hybrid objective over the initial $\tau$ iterations, then transitions to standard SC loss.
no code implementations • 28 Jan 2023 • Gui Citovsky, Giulia Desalvo, Sanjiv Kumar, Srikumar Ramalingam, Afshin Rostamizadeh, Yunjuan Wang
In such a setting, an algorithm can sample examples one at a time but, in order to limit overhead costs, is only able to update its state (i. e. further train model weights) once a large enough batch of examples is selected.
1 code implementation • NeurIPS 2021 • Gui Citovsky, Giulia Desalvo, Claudio Gentile, Lazaros Karydas, Anand Rajagopalan, Afshin Rostamizadeh, Sanjiv Kumar
The ability to train complex and highly effective models often requires an abundance of training data, which can easily become a bottleneck in cost, time, and computational resources.
1 code implementation • 25 May 2021 • Baris Sumengen, Anand Rajagopalan, Gui Citovsky, David Simcha, Olivier Bachem, Pradipta Mitra, Sam Blasiak, Mason Liang, Sanjiv Kumar
Hierarchical Agglomerative Clustering (HAC) is one of the oldest but still most widely used clustering methods.
no code implementations • 20 Sep 2019 • Aditya Krishna Menon, Anand Rajagopalan, Baris Sumengen, Gui Citovsky, Qin Cao, Sanjiv Kumar
The second algorithm, OHAC, is an online counterpart to offline HAC, which is known to yield a 1/3-approximation to the MW revenue, and produce good quality clusters in practice.