Search Results for author: Ruoyu Sun

Found 61 papers, 19 papers with code

When GNNs meet symmetry in ILPs: an orbit-based feature augmentation approach

1 code implementation24 Jan 2025 Qian Chen, Lei LI, Qian Li, Jianghua Wu, Akang Wang, Ruoyu Sun, Xiaodong Luo, Tsung-Hui Chang, Qingjiang Shi

In this work, we investigate the properties of permutation equivariance and invariance in GNNs, particularly in relation to the inherent symmetry of ILP formulations.

A novel multi-agent dynamic portfolio optimization learning system based on hierarchical deep reinforcement learning

no code implementations12 Jan 2025 Ruoyu Sun, Yue Xi, Angelos Stefanidis, Zhengyong Jiang, Jionglong Su

As a result, the DRL agents cannot explore the dynamic portfolio optimization policy to improve the risk-adjusted profitability in the training process.

Deep Reinforcement Learning Portfolio Optimization

Enabling Scalable Oversight via Self-Evolving Critic

no code implementations10 Jan 2025 Zhengyang Tang, Ziniu Li, Zhenyang Xiao, Tian Ding, Ruoyu Sun, Benyou Wang, Dayiheng Liu, Fei Huang, Tianyu Liu, Bowen Yu, Junyang Lin

Despite their remarkable performance, the development of Large Language Models (LLMs) faces a critical challenge in scalable oversight: providing effective feedback for tasks where human evaluation is difficult or where LLMs outperform humans.

Second Language (Arabic) Acquisition of LLMs via Progressive Vocabulary Expansion

no code implementations16 Dec 2024 Jianqing Zhu, Huang Huang, Zhihang Lin, Juhao Liang, Zhengyang Tang, Khalid Almubarak, Abdulmohsen Alharthik, Bang An, Juncai He, Xiangbo Wu, Fei Yu, Junying Chen, Zhuoheng Ma, Yuhao Du, He Zhang, Emad A. Alghamdi, Lian Zhang, Ruoyu Sun, Haizhou Li, Benyou Wang, Jinchao Xu

This paper addresses the critical need for democratizing large language models (LLM) in the Arab world, a region that has seen slower progress in developing models comparable to state-of-the-art offerings like GPT-4 or ChatGPT 3. 5, due to a predominant focus on mainstream languages (e. g., English and Chinese).

An Efficient Unsupervised Framework for Convex Quadratic Programs via Deep Unrolling

no code implementations2 Dec 2024 Linxin Yang, Bingheng Li, Tian Ding, Jianghua Wu, Akang Wang, Yuyi Wang, Jiliang Tang, Ruoyu Sun, Xiaodong Luo

Unlike the standard learning-to-optimize framework that requires optimization solutions generated by solvers, our unsupervised method adjusts the network weights directly from the evaluation of the primal-dual gap.

Position: On-Premises LLM Deployment Demands a Middle Path: Preserving Privacy Without Sacrificing Model Confidentiality

1 code implementation15 Oct 2024 Hanbo Huang, Yihan Li, Bowen Jiang, Lin Liu, Bo Jiang, Ruoyu Sun, Zhuotao Liu, Shiyu Liang

Current LLM customization typically relies on two deployment strategies: closed-source APIs, which require users to upload private data to external servers, and open-weight models, which allow local fine-tuning but pose misuse risks.

Position Privacy Preserving

Adam-mini: Use Fewer Learning Rates To Gain More

1 code implementation24 Jun 2024 Yushun Zhang, Congliang Chen, Ziniu Li, Tian Ding, Chenwei Wu, Diederik P. Kingma, Yinyu Ye, Zhi-Quan Luo, Ruoyu Sun

Adam-mini reduces memory by cutting down the learning rate resources in Adam (i. e., $1/\sqrt{v}$).

Bridging the Gap: Rademacher Complexity in Robust and Standard Generalization

no code implementations8 Jun 2024 Jiancong Xiao, Ruoyu Sun, Qi Long, Weijie J. Su

We aim to construct a new cover that possesses two properties: 1) compatibility with adversarial examples, and 2) precision comparable to covers used in standard settings.

PDHG-Unrolled Learning-to-Optimize Method for Large-Scale Linear Programming

1 code implementation4 Jun 2024 Bingheng Li, Linxin Yang, Yupeng Chen, Senmiao Wang, Qian Chen, Haitao Mao, Yao Ma, Akang Wang, Tian Ding, Jiliang Tang, Ruoyu Sun

In this work, we propose an FOM-unrolled neural network (NN) called PDHG-Net, and propose a two-stage L2O method to solve large-scale LP problems.

On the Convergence of Adam under Non-uniform Smoothness: Separability from SGDM and Beyond

no code implementations22 Mar 2024 Bohan Wang, Huishuai Zhang, Qi Meng, Ruoyu Sun, Zhi-Ming Ma, Wei Chen

This paper aims to clearly distinguish between Stochastic Gradient Descent with Momentum (SGDM) and Adam in terms of their convergence rates.

Why Transformers Need Adam: A Hessian Perspective

2 code implementations26 Feb 2024 Yushun Zhang, Congliang Chen, Tian Ding, Ziniu Li, Ruoyu Sun, Zhi-Quan Luo

SGD performs worse than Adam by a significant margin on Transformers, but the reason remains unclear.

Combining Transformer based Deep Reinforcement Learning with Black-Litterman Model for Portfolio Optimization

no code implementations23 Feb 2024 Ruoyu Sun, Angelos Stefanidis, Zhengyong Jiang, Jionglong Su

However, typical DRL agents for portfolio optimization cannot learn a policy that is aware of the dynamic correlation between portfolio asset returns.

Deep Reinforcement Learning Portfolio Optimization

ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method for Aligning Large Language Models

2 code implementations16 Oct 2023 Ziniu Li, Tian Xu, Yushun Zhang, Zhihang Lin, Yang Yu, Ruoyu Sun, Zhi-Quan Luo

ReMax can save about 46% GPU memory than PPO when training a 7B model and enables training on A800-80GB GPUs without the memory-saving offloading technique needed by PPO.

General Reinforcement Learning reinforcement-learning

LEMON: Lossless model expansion

1 code implementation12 Oct 2023 Yite Wang, Jiahao Su, Hanlin Lu, Cong Xie, Tianyi Liu, Jianbo Yuan, Haibin Lin, Ruoyu Sun, Hongxia Yang

Our empirical results demonstrate that LEMON reduces computational costs by 56. 7% for Vision Transformers and 33. 2% for BERT when compared to training from scratch.

model

How Graph Neural Networks Learn: Lessons from Training Dynamics

1 code implementation8 Oct 2023 Chenxiao Yang, Qitian Wu, David Wipf, Ruoyu Sun, Junchi Yan

In particular, we find that the gradient descent optimization of GNNs implicitly leverages the graph structure to update the learned function, as can be quantified by a phenomenon which we call \emph{kernel-graph alignment}.

Inductive Bias

AceGPT, Localizing Large Language Models in Arabic

1 code implementation21 Sep 2023 Huang Huang, Fei Yu, Jianqing Zhu, Xuening Sun, Hao Cheng, Dingjie Song, Zhihong Chen, Abdulmohsen Alharthi, Bang An, Juncai He, Ziche Liu, Zhiyi Zhang, Junying Chen, Jianquan Li, Benyou Wang, Lian Zhang, Ruoyu Sun, Xiang Wan, Haizhou Li, Jinchao Xu

This paper is devoted to the development of a localized Large Language Model (LLM) specifically for Arabic, a language imbued with unique cultural characteristics inadequately addressed by current mainstream models.

Instruction Following Language Modeling +3

Restricted Generative Projection for One-Class Classification and Anomaly Detection

no code implementations9 Jul 2023 Feng Xiao, Ruoyu Sun, Jicong Fan

The core idea is to learn a mapping to transform the unknown distribution of training (normal) data to a known target distribution.

Informativeness One-Class Classification

NTK-SAP: Improving neural network pruning by aligning training dynamics

1 code implementation6 Apr 2023 Yite Wang, Dawei Li, Ruoyu Sun

Recent advances in neural tangent kernel (NTK) theory suggest that the training dynamics of large enough neural networks is closely related to the spectrum of the NTK.

Network Pruning

Balanced Training for Sparse GANs

1 code implementation NeurIPS 2023 Yite Wang, Jing Wu, Naira Hovakimyan, Ruoyu Sun

We also introduce a new method called balanced dynamic sparse training (ADAPT), which seeks to control the BR during GAN training to achieve a good trade-off between performance and computational cost.

Invariant Layers for Graphs with Nodes of Different Types

no code implementations27 Feb 2023 Dmitry Rybin, Ruoyu Sun, Zhi-Quan Luo

We further narrow the invariant network design space by addressing a question about the sizes of tensor layers necessary for function approximation on graph data.

Graph Neural Network

Adversarial Rademacher Complexity of Deep Neural Networks

1 code implementation27 Nov 2022 Jiancong Xiao, Yanbo Fan, Ruoyu Sun, Zhi-Quan Luo

Specifically, we provide the first bound of adversarial Rademacher complexity of deep neural networks.

ARC

DigGAN: Discriminator gradIent Gap Regularization for GAN Training with Limited Data

1 code implementation27 Nov 2022 Tiantian Fang, Ruoyu Sun, Alex Schwing

In contrast, we propose a Discriminator gradIent Gap regularized GAN (DigGAN) formulation which can be added to any existing GAN.

Data Augmentation

When Expressivity Meets Trainability: Fewer than $n$ Neurons Can Work

no code implementations NeurIPS 2021 Jiawei Zhang, Yushun Zhang, Mingyi Hong, Ruoyu Sun, Zhi-Quan Luo

Third, we consider a constrained optimization formulation where the feasible region is the nice local region, and prove that every KKT point is a nearly global minimizer.

Stability Analysis and Generalization Bounds of Adversarial Training

1 code implementation3 Oct 2022 Jiancong Xiao, Yanbo Fan, Ruoyu Sun, Jue Wang, Zhi-Quan Luo

In adversarial machine learning, deep neural networks can fit the adversarial examples on the training dataset but have poor generalization ability on the test set.

Generalization Bounds

Provable Adaptivity of Adam under Non-uniform Smoothness

no code implementations21 Aug 2022 Bohan Wang, Yushun Zhang, Huishuai Zhang, Qi Meng, Ruoyu Sun, Zhi-Ming Ma, Tie-Yan Liu, Zhi-Quan Luo, Wei Chen

We present the first convergence analysis of RR Adam without the bounded smoothness assumption.

Attribute

Adam Can Converge Without Any Modification On Update Rules

no code implementations20 Aug 2022 Yushun Zhang, Congliang Chen, Naichen Shi, Ruoyu Sun, Zhi-Quan Luo

We point out there is a mismatch between the settings of theory and practice: Reddi et al. 2018 pick the problem after picking the hyperparameters of Adam, i. e., $(\beta_1, \beta_2)$; while practical applications often fix the problem first and then tune $(\beta_1, \beta_2)$.

Global Convergence of MAML and Theory-Inspired Neural Architecture Search for Few-Shot Learning

1 code implementation CVPR 2022 Haoxiang Wang, Yite Wang, Ruoyu Sun, Bo Li

We show that the performance of MetaNTK-NAS is comparable or better than the state-of-the-art NAS method designed for few-shot learning while enjoying more than 100x speedup.

Few-Shot Learning Neural Architecture Search

Faster Directional Convergence of Linear Neural Networks under Spherically Symmetric Data

no code implementations NeurIPS 2021 Dachao Lin, Ruoyu Sun, Zhihua Zhang

In this paper, we study gradient methods for training deep linear neural networks with binary cross-entropy loss.

Towards Understanding the Impact of Model Size on Differential Private Classification

no code implementations27 Nov 2021 Yinchen Shen, Zhiguo Wang, Ruoyu Sun, Xiaojing Shen

Then we propose a feature selection method to reduce the size of the model, based on a new metric which trades off the classification accuracy and privacy preserving.

feature selection Privacy Preserving

Federated Semi-Supervised Learning with Class Distribution Mismatch

no code implementations29 Oct 2021 Zhiguo Wang, Xintong Wang, Ruoyu Sun, Tsung-Hui Chang

Similar to that encountered in federated supervised learning, class distribution of labeled/unlabeled data could be non-i. i. d.

Federated Learning

Does Momentum Change the Implicit Regularization on Separable Data?

no code implementations8 Oct 2021 Bohan Wang, Qi Meng, Huishuai Zhang, Ruoyu Sun, Wei Chen, Zhi-Ming Ma, Tie-Yan Liu

The momentum acceleration technique is widely adopted in many optimization algorithms.

Larger Model Causes Lower Classification Accuracy Under Differential Privacy: Reason and Solution

no code implementations29 Sep 2021 Yinchen Shen, Zhiguo Wang, Ruoyu Sun, Xiaojing Shen

Differential privacy (DP) is an essential technique for privacy-preserving, which works by adding random noise to the data.

Privacy Preserving

Achieving Small Test Error in Mildly Overparameterized Neural Networks

no code implementations24 Apr 2021 Shiyu Liang, Ruoyu Sun, R. Srikant

Recent theoretical works on over-parameterized neural nets have focused on two aspects: optimization and generalization.

Binary Classification

Precondition Layer and Its Use for GANs

no code implementations1 Jan 2021 Tiantian Fang, Alex Schwing, Ruoyu Sun

We use this PC-layer in two ways: 1) fixed preconditioning (FPC) adds a fixed PC-layer to all layers, and 2) adaptive preconditioning (APC) adaptively controls the strength of preconditioning.

On the Landscape of Sparse Linear Networks

no code implementations1 Jan 2021 Dachao Lin, Ruoyu Sun, Zhihua Zhang

Network pruning, or sparse network has a long history and practical significance in modern applications.

Network Pruning

RMSprop can converge with proper hyper-parameter

no code implementations ICLR 2021 Naichen Shi, Dawei Li, Mingyi Hong, Ruoyu Sun

Removing this assumption allows us to establish a phase transition from divergence to non-divergence for RMSProp.

On a Faster $R$-Linear Convergence Rate of the Barzilai-Borwein Method

no code implementations1 Jan 2021 Dawei Li, Ruoyu Sun

The Barzilai-Borwein (BB) method has demonstrated great empirical success in nonlinear optimization.

Towards a Better Global Loss Landscape of GANs

1 code implementation NeurIPS 2020 Ruoyu Sun, Tiantian Fang, Alex Schwing

We also perform experiments to support our theory that RpGAN has a better landscape than separable-GAN.

A Single-Loop Smoothed Gradient Descent-Ascent Algorithm for Nonconvex-Concave Min-Max Problems

no code implementations NeurIPS 2020 Jiawei Zhang, Peijun Xiao, Ruoyu Sun, Zhi-Quan Luo

We prove that the stabilized GDA algorithm can achieve an $O(1/\epsilon^2)$ iteration complexity for minimizing the pointwise maximum of a finite collection of nonconvex functions.

On the Landscape of One-hidden-layer Sparse Networks and Beyond

no code implementations16 Sep 2020 Dachao Lin, Ruoyu Sun, Zhihua Zhang

We show that linear networks can have no spurious valleys under special sparse structures, and non-linear networks could also admit no spurious valleys under a wide final layer.

Network Pruning

The Global Landscape of Neural Networks: An Overview

no code implementations2 Jul 2020 Ruoyu Sun, Dawei Li, Shiyu Liang, Tian Ding, R. Srikant

Second, we discuss a few rigorous results on the geometric properties of wide networks such as "no bad basin", and some modifications that eliminate sub-optimal local minima and/or decreasing paths to infinity.

Global Convergence and Generalization Bound of Gradient-Based Meta-Learning with Deep Neural Nets

2 code implementations25 Jun 2020 Haoxiang Wang, Ruoyu Sun, Bo Li

Gradient-based meta-learning (GBML) with deep neural nets (DNNs) has become a popular approach for few-shot learning.

Few-Shot Learning

Distilling Object Detectors with Task Adaptive Regularization

no code implementations23 Jun 2020 Ruoyu Sun, Fuhui Tang, Xiaopeng Zhang, Hongkai Xiong, Qi Tian

Knowledge distillation, which aims at training a smaller student network by transferring knowledge from a larger teacher model, is one of the promising solutions for model miniaturization.

Knowledge Distillation Object +1

DEED: A General Quantization Scheme for Communication Efficiency in Bits

no code implementations19 Jun 2020 Tian Ye, Peijun Xiao, Ruoyu Sun

In the infrequent communication setting, DEED combined with Federated averaging requires a smaller total number of bits than Federated Averaging.

Distributed Optimization Federated Learning +1

Revisiting Landscape Analysis in Deep Neural Networks: Eliminating Decreasing Paths to Infinity

no code implementations31 Dec 2019 Shiyu Liang, Ruoyu Sun, R. Srikant

More specifically, for a large class of over-parameterized deep neural networks with appropriate regularizers, the loss function has no bad local minima and no decreasing paths to infinity.

Sub-Optimal Local Minima Exist for Neural Networks with Almost All Non-Linear Activations

no code implementations4 Nov 2019 Tian Ding, Dawei Li, Ruoyu Sun

More specifically, we prove that for any multi-layer network with generic input data and non-linear activation functions, sub-optimal local minima can exist, no matter how wide the network is (as long as the last hidden layer has at least two neurons).

All

Understanding Limitation of Two Symmetrized Orders by Worst-case Complexity

no code implementations10 Oct 2019 Peijun Xiao, Zhisheng Xiao, Ruoyu Sun

Recently, Coordinate Descent (CD) with cyclic order was shown to be $O(n^2)$ times slower than randomized versions in the worst-case.

Vocal Bursts Valence Prediction

CP-GAN: Towards a Better Global Landscape of GANs

no code implementations25 Sep 2019 Ruoyu Sun, Tiantian Fang, Alex Schwing

In this work, we perform a global analysis of GANs from two perspectives: the global landscape of the outer-optimization problem and the global behavior of the gradient descent dynamics.

Off-road Autonomous Vehicles Traversability Analysis and Trajectory Planning Based on Deep Inverse Reinforcement Learning

no code implementations16 Sep 2019 Zeyu Zhu, Nan Li, Ruoyu Sun, Huijing Zhao, Donghao Xu

Different cost functions of traversability analysis are learned and tested at various scenes of capability in guiding the trajectory planning of different behaviors.

Autonomous Vehicles reinforcement-learning +3

Max-Sliced Wasserstein Distance and its use for GANs

no code implementations CVPR 2019 Ishan Deshpande, Yuan-Ting Hu, Ruoyu Sun, Ayis Pyrros, Nasir Siddiqui, Sanmi Koyejo, Zhizhen Zhao, David Forsyth, Alexander Schwing

Generative adversarial nets (GANs) and variational auto-encoders have significantly improved our distribution modeling capabilities, showing promise for dataset augmentation, image-to-image translation and feature learning.

Image-to-Image Translation Translation

On the Benefit of Width for Neural Networks: Disappearance of Bad Basins

no code implementations28 Dec 2018 Dawei Li, Tian Ding, Ruoyu Sun

Wide networks are often believed to have a nice optimization landscape, but what rigorous results can we prove?

On the Convergence of A Class of Adam-Type Algorithms for Non-Convex Optimization

no code implementations ICLR 2019 Xiangyi Chen, Sijia Liu, Ruoyu Sun, Mingyi Hong

We prove that under our derived conditions, these methods can achieve the convergence rate of order $O(\log{T}/\sqrt{T})$ for nonconvex stochastic optimization.

Open-Ended Question Answering Stochastic Optimization

Adding One Neuron Can Eliminate All Bad Local Minima

no code implementations NeurIPS 2018 Shiyu Liang, Ruoyu Sun, Jason D. Lee, R. Srikant

One of the main difficulties in analyzing neural networks is the non-convexity of the loss function which may have many bad local minima.

All Binary Classification +1

Understanding the Loss Surface of Neural Networks for Binary Classification

no code implementations ICML 2018 Shiyu Liang, Ruoyu Sun, Yixuan Li, R. Srikant

Here we focus on the training performance of single-layered neural networks for binary classification, and provide conditions under which the training error is zero at all local minima of a smooth hinge loss function.

Binary Classification Classification +1

Training Language Models Using Target-Propagation

1 code implementation15 Feb 2017 Sam Wiseman, Sumit Chopra, Marc'Aurelio Ranzato, Arthur Szlam, Ruoyu Sun, Soumith Chintala, Nicolas Vasilache

While Truncated Back-Propagation through Time (BPTT) is the most popular approach to training Recurrent Neural Networks (RNNs), it suffers from being inherently sequential (making parallelization difficult) and from truncating gradient flow between distant time-steps.

Guaranteed Matrix Completion via Non-convex Factorization

no code implementations28 Nov 2014 Ruoyu Sun, Zhi-Quan Luo

In this paper, we establish a theoretical guarantee for the factorization formulation to correctly recover the underlying low-rank matrix.

Matrix Completion

Cannot find the paper you are looking for? You can Submit a new open access paper.