1 code implementation • 24 Jan 2025 • Qian Chen, Lei LI, Qian Li, Jianghua Wu, Akang Wang, Ruoyu Sun, Xiaodong Luo, Tsung-Hui Chang, Qingjiang Shi
In this work, we investigate the properties of permutation equivariance and invariance in GNNs, particularly in relation to the inherent symmetry of ILP formulations.
1 code implementation • 24 Jan 2025 • Zhengyang Tang, Ziniu Li, Zhenyang Xiao, Tian Ding, Ruoyu Sun, Benyou Wang, Dayiheng Liu, Fei Huang, Tianyu Liu, Bowen Yu, Junyang Lin
In this work, we introduce a new benchmark designed to assess the critique capabilities of LLMs.
no code implementations • 12 Jan 2025 • Ruoyu Sun, Yue Xi, Angelos Stefanidis, Zhengyong Jiang, Jionglong Su
As a result, the DRL agents cannot explore the dynamic portfolio optimization policy to improve the risk-adjusted profitability in the training process.
no code implementations • 10 Jan 2025 • Zhengyang Tang, Ziniu Li, Zhenyang Xiao, Tian Ding, Ruoyu Sun, Benyou Wang, Dayiheng Liu, Fei Huang, Tianyu Liu, Bowen Yu, Junyang Lin
Despite their remarkable performance, the development of Large Language Models (LLMs) faces a critical challenge in scalable oversight: providing effective feedback for tasks where human evaluation is difficult or where LLMs outperform humans.
no code implementations • 16 Dec 2024 • Jianqing Zhu, Huang Huang, Zhihang Lin, Juhao Liang, Zhengyang Tang, Khalid Almubarak, Abdulmohsen Alharthik, Bang An, Juncai He, Xiangbo Wu, Fei Yu, Junying Chen, Zhuoheng Ma, Yuhao Du, He Zhang, Emad A. Alghamdi, Lian Zhang, Ruoyu Sun, Haizhou Li, Benyou Wang, Jinchao Xu
This paper addresses the critical need for democratizing large language models (LLM) in the Arab world, a region that has seen slower progress in developing models comparable to state-of-the-art offerings like GPT-4 or ChatGPT 3. 5, due to a predominant focus on mainstream languages (e. g., English and Chinese).
no code implementations • 2 Dec 2024 • Linxin Yang, Bingheng Li, Tian Ding, Jianghua Wu, Akang Wang, Yuyi Wang, Jiliang Tang, Ruoyu Sun, Xiaodong Luo
Unlike the standard learning-to-optimize framework that requires optimization solutions generated by solvers, our unsupervised method adjusts the network weights directly from the evaluation of the primal-dual gap.
1 code implementation • 15 Oct 2024 • Hanbo Huang, Yihan Li, Bowen Jiang, Lin Liu, Bo Jiang, Ruoyu Sun, Zhuotao Liu, Shiyu Liang
Current LLM customization typically relies on two deployment strategies: closed-source APIs, which require users to upload private data to external servers, and open-weight models, which allow local fine-tuning but pose misuse risks.
no code implementations • 29 Aug 2024 • Ziniu Li, Congliang Chen, Tian Xu, Zeyu Qin, Jiancong Xiao, Ruoyu Sun, Zhi-Quan Luo
For the SFT of Llama-3-8B models, GEM outperforms CE in several aspects.
no code implementations • 30 Jul 2024 • Yupeng Chen, Senmiao Wang, Zhihang Lin, Zeyu Qin, Yushun Zhang, Tian Ding, Ruoyu Sun
This could avoid impairing the model performance on the fine-tuning tasks.
1 code implementation • 24 Jun 2024 • Yushun Zhang, Congliang Chen, Ziniu Li, Tian Ding, Chenwei Wu, Diederik P. Kingma, Yinyu Ye, Zhi-Quan Luo, Ruoyu Sun
Adam-mini reduces memory by cutting down the learning rate resources in Adam (i. e., $1/\sqrt{v}$).
no code implementations • 8 Jun 2024 • Jiancong Xiao, Ruoyu Sun, Qi Long, Weijie J. Su
We aim to construct a new cover that possesses two properties: 1) compatibility with adversarial examples, and 2) precision comparable to covers used in standard settings.
1 code implementation • 4 Jun 2024 • Bingheng Li, Linxin Yang, Yupeng Chen, Senmiao Wang, Qian Chen, Haitao Mao, Yao Ma, Akang Wang, Tian Ding, Jiliang Tang, Ruoyu Sun
In this work, we propose an FOM-unrolled neural network (NN) called PDHG-Net, and propose a two-stage L2O method to solve large-scale LP problems.
no code implementations • 22 Mar 2024 • Bohan Wang, Huishuai Zhang, Qi Meng, Ruoyu Sun, Zhi-Ming Ma, Wei Chen
This paper aims to clearly distinguish between Stochastic Gradient Descent with Momentum (SGDM) and Adam in terms of their convergence rates.
2 code implementations • 26 Feb 2024 • Yushun Zhang, Congliang Chen, Tian Ding, Ziniu Li, Ruoyu Sun, Zhi-Quan Luo
SGD performs worse than Adam by a significant margin on Transformers, but the reason remains unclear.
no code implementations • 23 Feb 2024 • Ruoyu Sun, Angelos Stefanidis, Zhengyong Jiang, Jionglong Su
However, typical DRL agents for portfolio optimization cannot learn a policy that is aware of the dynamic correlation between portfolio asset returns.
2 code implementations • 16 Oct 2023 • Ziniu Li, Tian Xu, Yushun Zhang, Zhihang Lin, Yang Yu, Ruoyu Sun, Zhi-Quan Luo
ReMax can save about 46% GPU memory than PPO when training a 7B model and enables training on A800-80GB GPUs without the memory-saving offloading technique needed by PPO.
1 code implementation • 12 Oct 2023 • Yite Wang, Jiahao Su, Hanlin Lu, Cong Xie, Tianyi Liu, Jianbo Yuan, Haibin Lin, Ruoyu Sun, Hongxia Yang
Our empirical results demonstrate that LEMON reduces computational costs by 56. 7% for Vision Transformers and 33. 2% for BERT when compared to training from scratch.
1 code implementation • 8 Oct 2023 • Chenxiao Yang, Qitian Wu, David Wipf, Ruoyu Sun, Junchi Yan
In particular, we find that the gradient descent optimization of GNNs implicitly leverages the graph structure to update the learned function, as can be quantified by a phenomenon which we call \emph{kernel-graph alignment}.
1 code implementation • 21 Sep 2023 • Huang Huang, Fei Yu, Jianqing Zhu, Xuening Sun, Hao Cheng, Dingjie Song, Zhihong Chen, Abdulmohsen Alharthi, Bang An, Juncai He, Ziche Liu, Zhiyi Zhang, Junying Chen, Jianquan Li, Benyou Wang, Lian Zhang, Ruoyu Sun, Xiang Wan, Haizhou Li, Jinchao Xu
This paper is devoted to the development of a localized Large Language Model (LLM) specifically for Arabic, a language imbued with unique cultural characteristics inadequately addressed by current mainstream models.
no code implementations • 9 Jul 2023 • Feng Xiao, Ruoyu Sun, Jicong Fan
The core idea is to learn a mapping to transform the unknown distribution of training (normal) data to a known target distribution.
1 code implementation • 6 Apr 2023 • Yite Wang, Dawei Li, Ruoyu Sun
Recent advances in neural tangent kernel (NTK) theory suggest that the training dynamics of large enough neural networks is closely related to the spectrum of the NTK.
1 code implementation • NeurIPS 2023 • Yite Wang, Jing Wu, Naira Hovakimyan, Ruoyu Sun
We also introduce a new method called balanced dynamic sparse training (ADAPT), which seeks to control the BR during GAN training to achieve a good trade-off between performance and computational cost.
no code implementations • 27 Feb 2023 • Dmitry Rybin, Ruoyu Sun, Zhi-Quan Luo
We further narrow the invariant network design space by addressing a question about the sizes of tensor layers necessary for function approximation on graph data.
1 code implementation • 27 Nov 2022 • Jiancong Xiao, Yanbo Fan, Ruoyu Sun, Zhi-Quan Luo
Specifically, we provide the first bound of adversarial Rademacher complexity of deep neural networks.
1 code implementation • 27 Nov 2022 • Tiantian Fang, Ruoyu Sun, Alex Schwing
In contrast, we propose a Discriminator gradIent Gap regularized GAN (DigGAN) formulation which can be added to any existing GAN.
no code implementations • NeurIPS 2021 • Jiawei Zhang, Yushun Zhang, Mingyi Hong, Ruoyu Sun, Zhi-Quan Luo
Third, we consider a constrained optimization formulation where the feasible region is the nice local region, and prove that every KKT point is a nearly global minimizer.
1 code implementation • 3 Oct 2022 • Jiancong Xiao, Yanbo Fan, Ruoyu Sun, Jue Wang, Zhi-Quan Luo
In adversarial machine learning, deep neural networks can fit the adversarial examples on the training dataset but have poor generalization ability on the test set.
no code implementations • 21 Aug 2022 • Bohan Wang, Yushun Zhang, Huishuai Zhang, Qi Meng, Ruoyu Sun, Zhi-Ming Ma, Tie-Yan Liu, Zhi-Quan Luo, Wei Chen
We present the first convergence analysis of RR Adam without the bounded smoothness assumption.
no code implementations • 20 Aug 2022 • Yushun Zhang, Congliang Chen, Naichen Shi, Ruoyu Sun, Zhi-Quan Luo
We point out there is a mismatch between the settings of theory and practice: Reddi et al. 2018 pick the problem after picking the hyperparameters of Adam, i. e., $(\beta_1, \beta_2)$; while practical applications often fix the problem first and then tune $(\beta_1, \beta_2)$.
1 code implementation • CVPR 2022 • Haoxiang Wang, Yite Wang, Ruoyu Sun, Bo Li
We show that the performance of MetaNTK-NAS is comparable or better than the state-of-the-art NAS method designed for few-shot learning while enjoying more than 100x speedup.
no code implementations • NeurIPS 2021 • Dachao Lin, Ruoyu Sun, Zhihua Zhang
In this paper, we study gradient methods for training deep linear neural networks with binary cross-entropy loss.
no code implementations • 27 Nov 2021 • Yinchen Shen, Zhiguo Wang, Ruoyu Sun, Xiaojing Shen
Then we propose a feature selection method to reduce the size of the model, based on a new metric which trades off the classification accuracy and privacy preserving.
no code implementations • 8 Nov 2021 • Nuerxiati Abudurexiti, Kai He, Dongdong Hu, Svetlozar T. Rachev, Hasanjan Sayit, Ruoyu Sun
In this note, we give approximate closed form expressions for VaR and CVaR of portfolios of returns with NMVM distributions.
no code implementations • 29 Oct 2021 • Zhiguo Wang, Xintong Wang, Ruoyu Sun, Tsung-Hui Chang
Similar to that encountered in federated supervised learning, class distribution of labeled/unlabeled data could be non-i. i. d.
no code implementations • 8 Oct 2021 • Bohan Wang, Qi Meng, Huishuai Zhang, Ruoyu Sun, Wei Chen, Zhi-Ming Ma, Tie-Yan Liu
The momentum acceleration technique is widely adopted in many optimization algorithms.
no code implementations • 29 Sep 2021 • Yinchen Shen, Zhiguo Wang, Ruoyu Sun, Xiaojing Shen
Differential privacy (DP) is an essential technique for privacy-preserving, which works by adding random noise to the data.
no code implementations • 24 Apr 2021 • Shiyu Liang, Ruoyu Sun, R. Srikant
Recent theoretical works on over-parameterized neural nets have focused on two aspects: optimization and generalization.
no code implementations • 1 Jan 2021 • Tiantian Fang, Alex Schwing, Ruoyu Sun
We use this PC-layer in two ways: 1) fixed preconditioning (FPC) adds a fixed PC-layer to all layers, and 2) adaptive preconditioning (APC) adaptively controls the strength of preconditioning.
no code implementations • 1 Jan 2021 • Dachao Lin, Ruoyu Sun, Zhihua Zhang
Network pruning, or sparse network has a long history and practical significance in modern applications.
no code implementations • ICLR 2021 • Naichen Shi, Dawei Li, Mingyi Hong, Ruoyu Sun
Removing this assumption allows us to establish a phase transition from divergence to non-divergence for RMSProp.
no code implementations • 1 Jan 2021 • Dawei Li, Ruoyu Sun
The Barzilai-Borwein (BB) method has demonstrated great empirical success in nonlinear optimization.
1 code implementation • NeurIPS 2020 • Ruoyu Sun, Tiantian Fang, Alex Schwing
We also perform experiments to support our theory that RpGAN has a better landscape than separable-GAN.
no code implementations • NeurIPS 2020 • Jiawei Zhang, Peijun Xiao, Ruoyu Sun, Zhi-Quan Luo
We prove that the stabilized GDA algorithm can achieve an $O(1/\epsilon^2)$ iteration complexity for minimizing the pointwise maximum of a finite collection of nonconvex functions.
no code implementations • 16 Sep 2020 • Dachao Lin, Ruoyu Sun, Zhihua Zhang
We show that linear networks can have no spurious valleys under special sparse structures, and non-linear networks could also admit no spurious valleys under a wide final layer.
no code implementations • 2 Jul 2020 • Ruoyu Sun, Dawei Li, Shiyu Liang, Tian Ding, R. Srikant
Second, we discuss a few rigorous results on the geometric properties of wide networks such as "no bad basin", and some modifications that eliminate sub-optimal local minima and/or decreasing paths to infinity.
2 code implementations • 25 Jun 2020 • Haoxiang Wang, Ruoyu Sun, Bo Li
Gradient-based meta-learning (GBML) with deep neural nets (DNNs) has become a popular approach for few-shot learning.
no code implementations • 23 Jun 2020 • Ruoyu Sun, Fuhui Tang, Xiaopeng Zhang, Hongkai Xiong, Qi Tian
Knowledge distillation, which aims at training a smaller student network by transferring knowledge from a larger teacher model, is one of the promising solutions for model miniaturization.
no code implementations • 19 Jun 2020 • Tian Ye, Peijun Xiao, Ruoyu Sun
In the infrequent communication setting, DEED combined with Federated averaging requires a smaller total number of bits than Federated Averaging.
no code implementations • 31 Dec 2019 • Shiyu Liang, Ruoyu Sun, R. Srikant
More specifically, for a large class of over-parameterized deep neural networks with appropriate regularizers, the loss function has no bad local minima and no decreasing paths to infinity.
no code implementations • 19 Dec 2019 • Ruoyu Sun
When and why can a neural network be successfully trained?
no code implementations • 4 Nov 2019 • Tian Ding, Dawei Li, Ruoyu Sun
More specifically, we prove that for any multi-layer network with generic input data and non-linear activation functions, sub-optimal local minima can exist, no matter how wide the network is (as long as the last hidden layer has at least two neurons).
no code implementations • 10 Oct 2019 • Peijun Xiao, Zhisheng Xiao, Ruoyu Sun
Recently, Coordinate Descent (CD) with cyclic order was shown to be $O(n^2)$ times slower than randomized versions in the worst-case.
no code implementations • 25 Sep 2019 • Ruoyu Sun, Tiantian Fang, Alex Schwing
In this work, we perform a global analysis of GANs from two perspectives: the global landscape of the outer-optimization problem and the global behavior of the gradient descent dynamics.
no code implementations • 16 Sep 2019 • Zeyu Zhu, Nan Li, Ruoyu Sun, Huijing Zhao, Donghao Xu
Different cost functions of traversability analysis are learned and tested at various scenes of capability in guiding the trajectory planning of different behaviors.
no code implementations • CVPR 2019 • Ishan Deshpande, Yuan-Ting Hu, Ruoyu Sun, Ayis Pyrros, Nasir Siddiqui, Sanmi Koyejo, Zhizhen Zhao, David Forsyth, Alexander Schwing
Generative adversarial nets (GANs) and variational auto-encoders have significantly improved our distribution modeling capabilities, showing promise for dataset augmentation, image-to-image translation and feature learning.
no code implementations • 28 Dec 2018 • Dawei Li, Tian Ding, Ruoyu Sun
Wide networks are often believed to have a nice optimization landscape, but what rigorous results can we prove?
no code implementations • ICLR 2019 • Xiangyi Chen, Sijia Liu, Ruoyu Sun, Mingyi Hong
We prove that under our derived conditions, these methods can achieve the convergence rate of order $O(\log{T}/\sqrt{T})$ for nonconvex stochastic optimization.
no code implementations • NeurIPS 2018 • Shiyu Liang, Ruoyu Sun, Jason D. Lee, R. Srikant
One of the main difficulties in analyzing neural networks is the non-convexity of the loss function which may have many bad local minima.
no code implementations • ICML 2018 • Shiyu Liang, Ruoyu Sun, Yixuan Li, R. Srikant
Here we focus on the training performance of single-layered neural networks for binary classification, and provide conditions under which the training error is zero at all local minima of a smooth hinge loss function.
1 code implementation • 15 Feb 2017 • Sam Wiseman, Sumit Chopra, Marc'Aurelio Ranzato, Arthur Szlam, Ruoyu Sun, Soumith Chintala, Nicolas Vasilache
While Truncated Back-Propagation through Time (BPTT) is the most popular approach to training Recurrent Neural Networks (RNNs), it suffers from being inherently sequential (making parallelization difficult) and from truncating gradient flow between distant time-steps.
no code implementations • 28 Nov 2014 • Ruoyu Sun, Zhi-Quan Luo
In this paper, we establish a theoretical guarantee for the factorization formulation to correctly recover the underlying low-rank matrix.