no code implementations • 4 Oct 2022 • Yijun Dong, Yuege Xie, Rachel Ward
At the saddle point of the underlying objective, the weights assign label-dense samples to the supervised loss and label-sparse samples to the unsupervised consistency regularization.
1 code implementation • 7 Dec 2021 • Yuege Xie, Bobby Shi, Hayden Schaeffer, Rachel Ward
Inspired by the success of the iterative magnitude pruning technique in finding lottery tickets of neural networks, we propose a new method -- Sparser Random Feature Models via IMP (ShRIMP) -- to efficiently fit high-dimensional data with inherent low-dimensional structure in the form of sparse variable dependencies.
no code implementations • 17 Sep 2021 • Xiaoxia Wu, Yuege Xie, Simon Du, Rachel Ward
We propose a computationally-friendly adaptive learning rate schedule, "AdaLoss", which directly uses the information of the loss function to adjust the stepsize in gradient descent methods.
no code implementations • 15 Jun 2020 • Yuege Xie, Hung-Hsu Chou, Holger Rauhut, Rachel Ward
Motivated by surprisingly good generalization properties of learned deep neural networks in overparameterized scenarios and by the related double descent phenomenon, this paper analyzes the relation between smoothness and low generalization error in an overparameterized linear learning problem.
no code implementations • 28 Aug 2019 • Yuege Xie, Xiaoxia Wu, Rachel Ward
We prove that the norm version of the adaptive stochastic gradient method (AdaGrad-Norm) achieves a linear convergence rate for a subset of either strongly convex functions or non-convex functions that satisfy the Polyak Lojasiewicz (PL) inequality.