Search Results for author: Ruoyu Sun

Found 48 papers, 12 papers with code

Global Convergence and Generalization Bound of Gradient-Based Meta-Learning with Deep Neural Nets

2 code implementations25 Jun 2020 Haoxiang Wang, Ruoyu Sun, Bo Li

Gradient-based meta-learning (GBML) with deep neural nets (DNNs) has become a popular approach for few-shot learning.

Few-Shot Learning

AceGPT, Localizing Large Language Models in Arabic

1 code implementation21 Sep 2023 Huang Huang, Fei Yu, Jianqing Zhu, Xuening Sun, Hao Cheng, Dingjie Song, Zhihong Chen, Abdulmohsen Alharthi, Bang An, Juncai He, Ziche Liu, Zhiyi Zhang, Junying Chen, Jianquan Li, Benyou Wang, Lian Zhang, Ruoyu Sun, Xiang Wan, Haizhou Li, Jinchao Xu

This paper is devoted to the development of a localized Large Language Model (LLM) specifically for Arabic, a language imbued with unique cultural characteristics inadequately addressed by current mainstream models.

Instruction Following Language Modelling +2

Training Language Models Using Target-Propagation

1 code implementation15 Feb 2017 Sam Wiseman, Sumit Chopra, Marc'Aurelio Ranzato, Arthur Szlam, Ruoyu Sun, Soumith Chintala, Nicolas Vasilache

While Truncated Back-Propagation through Time (BPTT) is the most popular approach to training Recurrent Neural Networks (RNNs), it suffers from being inherently sequential (making parallelization difficult) and from truncating gradient flow between distant time-steps.

Towards a Better Global Loss Landscape of GANs

1 code implementation NeurIPS 2020 Ruoyu Sun, Tiantian Fang, Alex Schwing

We also perform experiments to support our theory that RpGAN has a better landscape than separable-GAN.

NTK-SAP: Improving neural network pruning by aligning training dynamics

1 code implementation6 Apr 2023 Yite Wang, Dawei Li, Ruoyu Sun

Recent advances in neural tangent kernel (NTK) theory suggest that the training dynamics of large enough neural networks is closely related to the spectrum of the NTK.

Network Pruning

Global Convergence of MAML and Theory-Inspired Neural Architecture Search for Few-Shot Learning

1 code implementation CVPR 2022 Haoxiang Wang, Yite Wang, Ruoyu Sun, Bo Li

We show that the performance of MetaNTK-NAS is comparable or better than the state-of-the-art NAS method designed for few-shot learning while enjoying more than 100x speedup.

Few-Shot Learning Neural Architecture Search

DigGAN: Discriminator gradIent Gap Regularization for GAN Training with Limited Data

1 code implementation27 Nov 2022 Tiantian Fang, Ruoyu Sun, Alex Schwing

In contrast, we propose a Discriminator gradIent Gap regularized GAN (DigGAN) formulation which can be added to any existing GAN.

Data Augmentation

Why Transformers Need Adam: A Hessian Perspective

1 code implementation26 Feb 2024 Yushun Zhang, Congliang Chen, Tian Ding, Ziniu Li, Ruoyu Sun, Zhi-Quan Luo

SGD performs worse than Adam by a significant margin on Transformers, but the reason remains unclear.

Balanced Training for Sparse GANs

1 code implementation NeurIPS 2023 Yite Wang, Jing Wu, Naira Hovakimyan, Ruoyu Sun

We also introduce a new method called balanced dynamic sparse training (ADAPT), which seeks to control the BR during GAN training to achieve a good trade-off between performance and computational cost.

Stability Analysis and Generalization Bounds of Adversarial Training

1 code implementation3 Oct 2022 Jiancong Xiao, Yanbo Fan, Ruoyu Sun, Jue Wang, Zhi-Quan Luo

In adversarial machine learning, deep neural networks can fit the adversarial examples on the training dataset but have poor generalization ability on the test set.

Generalization Bounds

Adversarial Rademacher Complexity of Deep Neural Networks

1 code implementation27 Nov 2022 Jiancong Xiao, Yanbo Fan, Ruoyu Sun, Zhi-Quan Luo

Specifically, we provide the first bound of adversarial Rademacher complexity of deep neural networks.

Adding One Neuron Can Eliminate All Bad Local Minima

no code implementations NeurIPS 2018 Shiyu Liang, Ruoyu Sun, Jason D. Lee, R. Srikant

One of the main difficulties in analyzing neural networks is the non-convexity of the loss function which may have many bad local minima.

Binary Classification General Classification

Understanding the Loss Surface of Neural Networks for Binary Classification

no code implementations ICML 2018 Shiyu Liang, Ruoyu Sun, Yixuan Li, R. Srikant

Here we focus on the training performance of single-layered neural networks for binary classification, and provide conditions under which the training error is zero at all local minima of a smooth hinge loss function.

Binary Classification Classification +1

Guaranteed Matrix Completion via Non-convex Factorization

no code implementations28 Nov 2014 Ruoyu Sun, Zhi-Quan Luo

In this paper, we establish a theoretical guarantee for the factorization formulation to correctly recover the underlying low-rank matrix.

Matrix Completion

On the Convergence of A Class of Adam-Type Algorithms for Non-Convex Optimization

no code implementations ICLR 2019 Xiangyi Chen, Sijia Liu, Ruoyu Sun, Mingyi Hong

We prove that under our derived conditions, these methods can achieve the convergence rate of order $O(\log{T}/\sqrt{T})$ for nonconvex stochastic optimization.

Open-Ended Question Answering Stochastic Optimization

On the Benefit of Width for Neural Networks: Disappearance of Bad Basins

no code implementations28 Dec 2018 Dawei Li, Tian Ding, Ruoyu Sun

Wide networks are often believed to have a nice optimization landscape, but what rigorous results can we prove?

Max-Sliced Wasserstein Distance and its use for GANs

no code implementations CVPR 2019 Ishan Deshpande, Yuan-Ting Hu, Ruoyu Sun, Ayis Pyrros, Nasir Siddiqui, Sanmi Koyejo, Zhizhen Zhao, David Forsyth, Alexander Schwing

Generative adversarial nets (GANs) and variational auto-encoders have significantly improved our distribution modeling capabilities, showing promise for dataset augmentation, image-to-image translation and feature learning.

Image-to-Image Translation Translation

Off-road Autonomous Vehicles Traversability Analysis and Trajectory Planning Based on Deep Inverse Reinforcement Learning

no code implementations16 Sep 2019 Zeyu Zhu, Nan Li, Ruoyu Sun, Huijing Zhao, Donghao Xu

Different cost functions of traversability analysis are learned and tested at various scenes of capability in guiding the trajectory planning of different behaviors.

Autonomous Vehicles reinforcement-learning +2

Understanding Limitation of Two Symmetrized Orders by Worst-case Complexity

no code implementations10 Oct 2019 Peijun Xiao, Zhisheng Xiao, Ruoyu Sun

Recently, Coordinate Descent (CD) with cyclic order was shown to be $O(n^2)$ times slower than randomized versions in the worst-case.

Vocal Bursts Valence Prediction

Sub-Optimal Local Minima Exist for Neural Networks with Almost All Non-Linear Activations

no code implementations4 Nov 2019 Tian Ding, Dawei Li, Ruoyu Sun

More specifically, we prove that for any multi-layer network with generic input data and non-linear activation functions, sub-optimal local minima can exist, no matter how wide the network is (as long as the last hidden layer has at least two neurons).

Optimization for deep learning: theory and algorithms

no code implementations19 Dec 2019 Ruoyu Sun

When and why can a neural network be successfully trained?

Learning Theory

Revisiting Landscape Analysis in Deep Neural Networks: Eliminating Decreasing Paths to Infinity

no code implementations31 Dec 2019 Shiyu Liang, Ruoyu Sun, R. Srikant

More specifically, for a large class of over-parameterized deep neural networks with appropriate regularizers, the loss function has no bad local minima and no decreasing paths to infinity.

DEED: A General Quantization Scheme for Communication Efficiency in Bits

no code implementations19 Jun 2020 Tian Ye, Peijun Xiao, Ruoyu Sun

In the infrequent communication setting, DEED combined with Federated averaging requires a smaller total number of bits than Federated Averaging.

Distributed Optimization Federated Learning +1

Distilling Object Detectors with Task Adaptive Regularization

no code implementations23 Jun 2020 Ruoyu Sun, Fuhui Tang, Xiaopeng Zhang, Hongkai Xiong, Qi Tian

Knowledge distillation, which aims at training a smaller student network by transferring knowledge from a larger teacher model, is one of the promising solutions for model miniaturization.

Knowledge Distillation Object +1

The Global Landscape of Neural Networks: An Overview

no code implementations2 Jul 2020 Ruoyu Sun, Dawei Li, Shiyu Liang, Tian Ding, R. Srikant

Second, we discuss a few rigorous results on the geometric properties of wide networks such as "no bad basin", and some modifications that eliminate sub-optimal local minima and/or decreasing paths to infinity.

On the Landscape of One-hidden-layer Sparse Networks and Beyond

no code implementations16 Sep 2020 Dachao Lin, Ruoyu Sun, Zhihua Zhang

We show that linear networks can have no spurious valleys under special sparse structures, and non-linear networks could also admit no spurious valleys under a wide final layer.

Network Pruning

RMSprop can converge with proper hyper-parameter

no code implementations ICLR 2021 Naichen Shi, Dawei Li, Mingyi Hong, Ruoyu Sun

Removing this assumption allows us to establish a phase transition from divergence to non-divergence for RMSProp.

Precondition Layer and Its Use for GANs

no code implementations1 Jan 2021 Tiantian Fang, Alex Schwing, Ruoyu Sun

We use this PC-layer in two ways: 1) fixed preconditioning (FPC) adds a fixed PC-layer to all layers, and 2) adaptive preconditioning (APC) adaptively controls the strength of preconditioning.

On the Landscape of Sparse Linear Networks

no code implementations1 Jan 2021 Dachao Lin, Ruoyu Sun, Zhihua Zhang

Network pruning, or sparse network has a long history and practical significance in modern applications.

Network Pruning

A Single-Loop Smoothed Gradient Descent-Ascent Algorithm for Nonconvex-Concave Min-Max Problems

no code implementations NeurIPS 2020 Jiawei Zhang, Peijun Xiao, Ruoyu Sun, Zhi-Quan Luo

We prove that the stabilized GDA algorithm can achieve an $O(1/\epsilon^2)$ iteration complexity for minimizing the pointwise maximum of a finite collection of nonconvex functions.

On a Faster $R$-Linear Convergence Rate of the Barzilai-Borwein Method

no code implementations1 Jan 2021 Dawei Li, Ruoyu Sun

The Barzilai-Borwein (BB) method has demonstrated great empirical success in nonlinear optimization.

Achieving Small Test Error in Mildly Overparameterized Neural Networks

no code implementations24 Apr 2021 Shiyu Liang, Ruoyu Sun, R. Srikant

Recent theoretical works on over-parameterized neural nets have focused on two aspects: optimization and generalization.

Binary Classification

Does Momentum Change the Implicit Regularization on Separable Data?

no code implementations8 Oct 2021 Bohan Wang, Qi Meng, Huishuai Zhang, Ruoyu Sun, Wei Chen, Zhi-Ming Ma, Tie-Yan Liu

The momentum acceleration technique is widely adopted in many optimization algorithms.

Larger Model Causes Lower Classification Accuracy Under Differential Privacy: Reason and Solution

no code implementations29 Sep 2021 Yinchen Shen, Zhiguo Wang, Ruoyu Sun, Xiaojing Shen

Differential privacy (DP) is an essential technique for privacy-preserving, which works by adding random noise to the data.

Privacy Preserving

Federated Semi-Supervised Learning with Class Distribution Mismatch

no code implementations29 Oct 2021 Zhiguo Wang, Xintong Wang, Ruoyu Sun, Tsung-Hui Chang

Similar to that encountered in federated supervised learning, class distribution of labeled/unlabeled data could be non-i. i. d.

Federated Learning

Faster Directional Convergence of Linear Neural Networks under Spherically Symmetric Data

no code implementations NeurIPS 2021 Dachao Lin, Ruoyu Sun, Zhihua Zhang

In this paper, we study gradient methods for training deep linear neural networks with binary cross-entropy loss.

CP-GAN: Towards a Better Global Landscape of GANs

no code implementations25 Sep 2019 Ruoyu Sun, Tiantian Fang, Alex Schwing

In this work, we perform a global analysis of GANs from two perspectives: the global landscape of the outer-optimization problem and the global behavior of the gradient descent dynamics.

Towards Understanding the Impact of Model Size on Differential Private Classification

no code implementations27 Nov 2021 Yinchen Shen, Zhiguo Wang, Ruoyu Sun, Xiaojing Shen

Then we propose a feature selection method to reduce the size of the model, based on a new metric which trades off the classification accuracy and privacy preserving.

feature selection Privacy Preserving

Adam Can Converge Without Any Modification On Update Rules

no code implementations20 Aug 2022 Yushun Zhang, Congliang Chen, Naichen Shi, Ruoyu Sun, Zhi-Quan Luo

We point out there is a mismatch between the settings of theory and practice: Reddi et al. 2018 pick the problem after picking the hyperparameters of Adam, i. e., $(\beta_1, \beta_2)$; while practical applications often fix the problem first and then tune $(\beta_1, \beta_2)$.

When Expressivity Meets Trainability: Fewer than $n$ Neurons Can Work

no code implementations NeurIPS 2021 Jiawei Zhang, Yushun Zhang, Mingyi Hong, Ruoyu Sun, Zhi-Quan Luo

Third, we consider a constrained optimization formulation where the feasible region is the nice local region, and prove that every KKT point is a nearly global minimizer.

Invariant Layers for Graphs with Nodes of Different Types

no code implementations27 Feb 2023 Dmitry Rybin, Ruoyu Sun, Zhi-Quan Luo

We further narrow the invariant network design space by addressing a question about the sizes of tensor layers necessary for function approximation on graph data.

Restricted Generative Projection for One-Class Classification and Anomaly Detection

no code implementations9 Jul 2023 Feng Xiao, Ruoyu Sun, Jicong Fan

The core idea is to learn a mapping to transform the unknown distribution of training (normal) data to a known target distribution.

Informativeness One-Class Classification

How Graph Neural Networks Learn: Lessons from Training Dynamics

no code implementations8 Oct 2023 Chenxiao Yang, Qitian Wu, David Wipf, Ruoyu Sun, Junchi Yan

A long-standing goal in deep learning has been to characterize the learning behavior of black-box models in a more interpretable manner.

Inductive Bias

LEMON: Lossless model expansion

no code implementations12 Oct 2023 Yite Wang, Jiahao Su, Hanlin Lu, Cong Xie, Tianyi Liu, Jianbo Yuan, Haibin Lin, Ruoyu Sun, Hongxia Yang

Our empirical results demonstrate that LEMON reduces computational costs by 56. 7% for Vision Transformers and 33. 2% for BERT when compared to training from scratch.

Combining Transformer based Deep Reinforcement Learning with Black-Litterman Model for Portfolio Optimization

no code implementations23 Feb 2024 Ruoyu Sun, Angelos Stefanidis, Zhengyong Jiang, Jionglong Su

However, typical DRL agents for portfolio optimization cannot learn a policy that is aware of the dynamic correlation between portfolio asset returns.

Portfolio Optimization

On the Convergence of Adam under Non-uniform Smoothness: Separability from SGDM and Beyond

no code implementations22 Mar 2024 Bohan Wang, Huishuai Zhang, Qi Meng, Ruoyu Sun, Zhi-Ming Ma, Wei Chen

This paper aims to clearly distinguish between Stochastic Gradient Descent with Momentum (SGDM) and Adam in terms of their convergence rates.

Cannot find the paper you are looking for? You can Submit a new open access paper.