no code implementations • 5 Aug 2020 • Nicholas Meisburger, Anshumali Shrivastava
In the end, we show how our system scale to Tera-scale Criteo dataset with more than 4 billion samples.
no code implementations • 31 Dec 2020 • Shabnam Daghaghi, Tharun Medini, Nicholas Meisburger, Beidi Chen, Mengnan Zhao, Anshumali Shrivastava
Unfortunately, due to the dynamically updated parameters and data samples, there is no sampling scheme that is provably adaptive and samples the negative classes efficiently.
2 code implementations • 6 Mar 2021 • Shabnam Daghaghi, Nicholas Meisburger, Mengnan Zhao, Yong Wu, Sameh Gobriel, Charlie Tai, Anshumali Shrivastava
Our work highlights several novel perspectives and opportunities for implementing randomized algorithms for deep learning on modern CPUs.
no code implementations • 29 Jan 2022 • Minghao Yan, Nicholas Meisburger, Tharun Medini, Anshumali Shrivastava
We show that with reduced communication, due to sparsity, we can train close to a billion parameter model on simple 4-16 core CPU nodes connected by basic low bandwidth interconnect.
2 code implementations • 30 Mar 2023 • Nicholas Meisburger, Vihan Lakshman, Benito Geordie, Joshua Engels, David Torres Ramos, Pratik Pranav, Benjamin Coleman, Benjamin Meisburger, Shubh Gupta, Yashwanth Adunukota, Tharun Medini, Anshumali Shrivastava
Efficient large-scale neural network training and inference on commodity CPU hardware is of immense practical significance in democratizing deep learning (DL) capabilities.
Ranked #2 on Node Classification on Yelp-Fraud