no code implementations • 4 Mar 2024 • Cameron R. Wolfe, Anastasios Kyrillidis
From these experiments, we discover alternative CPT schedules that offer further improvements in training efficiency and model performance, as well as derive a set of best practices for choosing CPT schedules.
no code implementations • 9 Nov 2022 • Cameron R. Wolfe, Anastasios Kyrillidis
To mitigate these shortcomings, we propose Cold Start Streaming Learning (CSSL), a simple, end-to-end approach for streaming learning with deep networks that uses a combination of replay and data augmentation to avoid catastrophic forgetting.
1 code implementation • ICLR 2022 • Cheng Wan, Youjie Li, Cameron R. Wolfe, Anastasios Kyrillidis, Nam Sung Kim, Yingyan Lin
Notably, little is known regarding the convergence rate of GCN training with both stale features and stale feature gradients.
1 code implementation • 7 Dec 2021 • Cameron R. Wolfe, Anastasios Kyrillidis
We propose a novel, structured pruning algorithm for neural networks -- the iterative, Sparse Structured Pruning algorithm, dubbed as i-SpaSP.
no code implementations • 31 Jul 2021 • Cameron R. Wolfe, Fangshuo Liao, Qihan Wang, Junhyung Lyle Kim, Anastasios Kyrillidis
Aiming to mathematically analyze the amount of dense network pre-training needed for a pruned network to perform well, we discover a simple theoretical bound in the number of gradient descent pre-training iterations on a two-layer, fully-connected network, beyond which pruning via greedy forward selection [61] yields a subnetwork that achieves good training error.
no code implementations • 27 Jul 2021 • Cameron R. Wolfe, Keld T. Lundgaard
By leveraging large amounts of product data collected across hundreds of live e-commerce websites, we construct 1000 unique classification tasks that share similarly-structured input data, comprised of both text and images.
no code implementations • 2 Jul 2021 • Chen Dun, Cameron R. Wolfe, Christopher M. Jermaine, Anastasios Kyrillidis
Thus, ResIST reduces the per-iteration communication, memory, and time requirements of ResNet training to only a fraction of the requirements of full-model training.
1 code implementation • 20 Feb 2021 • Cameron R. Wolfe, Jingkang Yang, Arindam Chowdhury, Chen Dun, Artun Bayer, Santiago Segarra, Anastasios Kyrillidis
The graph convolutional network (GCN) is a go-to solution for machine learning on graphs, but its training is notoriously difficult to scale both in terms of graph size and the number of model parameters.
no code implementations • 28 Nov 2019 • Cameron R. Wolfe, Keld T. Lundgaard
In this work, we propose data augmentation methods for embeddings from pre-trained deep learning models that take a weighted combination of a pair of input embeddings, as inspired by Mixup, and combine such augmentation with extra label softening.
2 code implementations • 4 Oct 2019 • Binhang Yuan, Cameron R. Wolfe, Chen Dun, Yuxin Tang, Anastasios Kyrillidis, Christopher M. Jermaine
These properties of IST can cope with issues due to distributed data, slow interconnects, or limited device memory, making IST a suitable approach for cases of mandatory distribution.
no code implementations • 25 Mar 2019 • Cameron R. Wolfe, Cem C. Tutum, Risto Miikkulainen
However, while static designs are easily produced with 3D printing, functional designs with moving parts are more difficult to generate: The search space is too high-dimensional, the resolution of the 3D-printed parts is not adequate, and it is difficult to predict the physical behavior of imperfect 3D-printed mechanisms.