1 code implementation • 16 Oct 2023 • Ziniu Li, Tian Xu, Yushun Zhang, Yang Yu, Ruoyu Sun, Zhi-Quan Luo
This is due to the computational overhead of the value model, which does not exist in ReMax.
no code implementations • 12 Oct 2023 • Yite Wang, Jiahao Su, Hanlin Lu, Cong Xie, Tianyi Liu, Jianbo Yuan, Haibin Lin, Ruoyu Sun, Hongxia Yang
Our empirical results demonstrate that LEMON reduces computational costs by 56. 7% for Vision Transformers and 33. 2% for BERT when compared to training from scratch.
no code implementations • 8 Oct 2023 • Chenxiao Yang, Qitian Wu, David Wipf, Ruoyu Sun, Junchi Yan
A long-standing goal in deep learning has been to characterize the learning behavior of black-box models in a more interpretable manner.
1 code implementation • 21 Sep 2023 • Huang Huang, Fei Yu, Jianqing Zhu, Xuening Sun, Hao Cheng, Dingjie Song, Zhihong Chen, Abdulmohsen Alharthi, Bang An, Juncai He, Ziche Liu, Zhiyi Zhang, Junying Chen, Jianquan Li, Benyou Wang, Lian Zhang, Ruoyu Sun, Xiang Wan, Haizhou Li, Jinchao Xu
This paper is devoted to the development of a localized Large Language Model (LLM) specifically for Arabic, a language imbued with unique cultural characteristics inadequately addressed by current mainstream models.
no code implementations • 9 Jul 2023 • Feng Xiao, Ruoyu Sun, Jicong Fan
The core idea is to learn a mapping to transform the unknown distribution of training (normal) data to a known target distribution.
1 code implementation • 6 Apr 2023 • Yite Wang, Dawei Li, Ruoyu Sun
Recent advances in neural tangent kernel (NTK) theory suggest that the training dynamics of large enough neural networks is closely related to the spectrum of the NTK.
1 code implementation • NeurIPS 2023 • Yite Wang, Jing Wu, Naira Hovakimyan, Ruoyu Sun
We also introduce a new method called balanced dynamic sparse training (ADAPT), which seeks to control the BR during GAN training to achieve a good trade-off between performance and computational cost.
no code implementations • 27 Feb 2023 • Dmitry Rybin, Ruoyu Sun, Zhi-Quan Luo
We further narrow the invariant network design space by addressing a question about the sizes of tensor layers necessary for function approximation on graph data.
1 code implementation • 27 Nov 2022 • Jiancong Xiao, Yanbo Fan, Ruoyu Sun, Zhi-Quan Luo
Specifically, we provide the first bound of adversarial Rademacher complexity of deep neural networks.
1 code implementation • 27 Nov 2022 • Tiantian Fang, Ruoyu Sun, Alex Schwing
In contrast, we propose a Discriminator gradIent Gap regularized GAN (DigGAN) formulation which can be added to any existing GAN.
no code implementations • NeurIPS 2021 • Jiawei Zhang, Yushun Zhang, Mingyi Hong, Ruoyu Sun, Zhi-Quan Luo
Third, we consider a constrained optimization formulation where the feasible region is the nice local region, and prove that every KKT point is a nearly global minimizer.
1 code implementation • 3 Oct 2022 • Jiancong Xiao, Yanbo Fan, Ruoyu Sun, Jue Wang, Zhi-Quan Luo
In adversarial machine learning, deep neural networks can fit the adversarial examples on the training dataset but have poor generalization ability on the test set.
no code implementations • 20 Aug 2022 • Yushun Zhang, Congliang Chen, Naichen Shi, Ruoyu Sun, Zhi-Quan Luo
We point out there is a mismatch between the settings of theory and practice: Reddi et al. 2018 pick the problem after picking the hyperparameters of Adam, i. e., $(\beta_1, \beta_2)$; while practical applications often fix the problem first and then tune $(\beta_1, \beta_2)$.
1 code implementation • CVPR 2022 • Haoxiang Wang, Yite Wang, Ruoyu Sun, Bo Li
We show that the performance of MetaNTK-NAS is comparable or better than the state-of-the-art NAS method designed for few-shot learning while enjoying more than 100x speedup.
no code implementations • NeurIPS 2021 • Dachao Lin, Ruoyu Sun, Zhihua Zhang
In this paper, we study gradient methods for training deep linear neural networks with binary cross-entropy loss.
no code implementations • 27 Nov 2021 • Yinchen Shen, Zhiguo Wang, Ruoyu Sun, Xiaojing Shen
Then we propose a feature selection method to reduce the size of the model, based on a new metric which trades off the classification accuracy and privacy preserving.
no code implementations • 8 Nov 2021 • Nuerxiati Abudurexiti, Kai He, Dongdong Hu, Svetlozar T. Rachev, Hasanjan Sayit, Ruoyu Sun
In this note, we give approximate closed form expressions for VaR and CVaR of portfolios of returns with NMVM distributions.
no code implementations • 29 Oct 2021 • Zhiguo Wang, Xintong Wang, Ruoyu Sun, Tsung-Hui Chang
Similar to that encountered in federated supervised learning, class distribution of labeled/unlabeled data could be non-i. i. d.
no code implementations • 8 Oct 2021 • Bohan Wang, Qi Meng, Huishuai Zhang, Ruoyu Sun, Wei Chen, Zhi-Ming Ma, Tie-Yan Liu
The momentum acceleration technique is widely adopted in many optimization algorithms.
no code implementations • 29 Sep 2021 • Yinchen Shen, Zhiguo Wang, Ruoyu Sun, Xiaojing Shen
Differential privacy (DP) is an essential technique for privacy-preserving, which works by adding random noise to the data.
no code implementations • 24 Apr 2021 • Shiyu Liang, Ruoyu Sun, R. Srikant
Recent theoretical works on over-parameterized neural nets have focused on two aspects: optimization and generalization.
no code implementations • 1 Jan 2021 • Dachao Lin, Ruoyu Sun, Zhihua Zhang
Network pruning, or sparse network has a long history and practical significance in modern applications.
no code implementations • 1 Jan 2021 • Tiantian Fang, Alex Schwing, Ruoyu Sun
We use this PC-layer in two ways: 1) fixed preconditioning (FPC) adds a fixed PC-layer to all layers, and 2) adaptive preconditioning (APC) adaptively controls the strength of preconditioning.
no code implementations • 1 Jan 2021 • Dawei Li, Ruoyu Sun
The Barzilai-Borwein (BB) method has demonstrated great empirical success in nonlinear optimization.
no code implementations • ICLR 2021 • Naichen Shi, Dawei Li, Mingyi Hong, Ruoyu Sun
Removing this assumption allows us to establish a phase transition from divergence to non-divergence for RMSProp.
1 code implementation • NeurIPS 2020 • Ruoyu Sun, Tiantian Fang, Alex Schwing
We also perform experiments to support our theory that RpGAN has a better landscape than separable-GAN.
no code implementations • NeurIPS 2020 • Jiawei Zhang, Peijun Xiao, Ruoyu Sun, Zhi-Quan Luo
We prove that the stabilized GDA algorithm can achieve an $O(1/\epsilon^2)$ iteration complexity for minimizing the pointwise maximum of a finite collection of nonconvex functions.
no code implementations • 16 Sep 2020 • Dachao Lin, Ruoyu Sun, Zhihua Zhang
We show that linear networks can have no spurious valleys under special sparse structures, and non-linear networks could also admit no spurious valleys under a wide final layer.
no code implementations • 2 Jul 2020 • Ruoyu Sun, Dawei Li, Shiyu Liang, Tian Ding, R. Srikant
Second, we discuss a few rigorous results on the geometric properties of wide networks such as "no bad basin", and some modifications that eliminate sub-optimal local minima and/or decreasing paths to infinity.
2 code implementations • 25 Jun 2020 • Haoxiang Wang, Ruoyu Sun, Bo Li
Gradient-based meta-learning (GBML) with deep neural nets (DNNs) has become a popular approach for few-shot learning.
no code implementations • 23 Jun 2020 • Ruoyu Sun, Fuhui Tang, Xiaopeng Zhang, Hongkai Xiong, Qi Tian
Knowledge distillation, which aims at training a smaller student network by transferring knowledge from a larger teacher model, is one of the promising solutions for model miniaturization.
no code implementations • 19 Jun 2020 • Tian Ye, Peijun Xiao, Ruoyu Sun
In the infrequent communication setting, DEED combined with Federated averaging requires a smaller total number of bits than Federated Averaging.
no code implementations • 31 Dec 2019 • Shiyu Liang, Ruoyu Sun, R. Srikant
More specifically, for a large class of over-parameterized deep neural networks with appropriate regularizers, the loss function has no bad local minima and no decreasing paths to infinity.
no code implementations • 19 Dec 2019 • Ruoyu Sun
When and why can a neural network be successfully trained?
no code implementations • 4 Nov 2019 • Tian Ding, Dawei Li, Ruoyu Sun
More specifically, we prove that for any multi-layer network with generic input data and non-linear activation functions, sub-optimal local minima can exist, no matter how wide the network is (as long as the last hidden layer has at least two neurons).
no code implementations • 10 Oct 2019 • Peijun Xiao, Zhisheng Xiao, Ruoyu Sun
Recently, Coordinate Descent (CD) with cyclic order was shown to be $O(n^2)$ times slower than randomized versions in the worst-case.
no code implementations • 25 Sep 2019 • Ruoyu Sun, Tiantian Fang, Alex Schwing
In this work, we perform a global analysis of GANs from two perspectives: the global landscape of the outer-optimization problem and the global behavior of the gradient descent dynamics.
no code implementations • 16 Sep 2019 • Zeyu Zhu, Nan Li, Ruoyu Sun, Huijing Zhao, Donghao Xu
Different cost functions of traversability analysis are learned and tested at various scenes of capability in guiding the trajectory planning of different behaviors.
no code implementations • CVPR 2019 • Ishan Deshpande, Yuan-Ting Hu, Ruoyu Sun, Ayis Pyrros, Nasir Siddiqui, Sanmi Koyejo, Zhizhen Zhao, David Forsyth, Alexander Schwing
Generative adversarial nets (GANs) and variational auto-encoders have significantly improved our distribution modeling capabilities, showing promise for dataset augmentation, image-to-image translation and feature learning.
no code implementations • 28 Dec 2018 • Dawei Li, Tian Ding, Ruoyu Sun
Wide networks are often believed to have a nice optimization landscape, but what rigorous results can we prove?
no code implementations • ICLR 2019 • Xiangyi Chen, Sijia Liu, Ruoyu Sun, Mingyi Hong
We prove that under our derived conditions, these methods can achieve the convergence rate of order $O(\log{T}/\sqrt{T})$ for nonconvex stochastic optimization.
no code implementations • NeurIPS 2018 • Shiyu Liang, Ruoyu Sun, Jason D. Lee, R. Srikant
One of the main difficulties in analyzing neural networks is the non-convexity of the loss function which may have many bad local minima.
no code implementations • ICML 2018 • Shiyu Liang, Ruoyu Sun, Yixuan Li, R. Srikant
Here we focus on the training performance of single-layered neural networks for binary classification, and provide conditions under which the training error is zero at all local minima of a smooth hinge loss function.
1 code implementation • 15 Feb 2017 • Sam Wiseman, Sumit Chopra, Marc'Aurelio Ranzato, Arthur Szlam, Ruoyu Sun, Soumith Chintala, Nicolas Vasilache
While Truncated Back-Propagation through Time (BPTT) is the most popular approach to training Recurrent Neural Networks (RNNs), it suffers from being inherently sequential (making parallelization difficult) and from truncating gradient flow between distant time-steps.
no code implementations • 28 Nov 2014 • Ruoyu Sun, Zhi-Quan Luo
In this paper, we establish a theoretical guarantee for the factorization formulation to correctly recover the underlying low-rank matrix.