no code implementations • ICML 2020 • Haoran Sun, Songtao Lu, Mingyi Hong
Similarly, for online problems, the proposed method achieves an $\mathcal{O}(m \epsilon^{-3/2})$ sample complexity and an $\mathcal{O}(\epsilon^{-1})$ communication complexity, while the best existing bounds are $\mathcal{O}(m\epsilon^{-2})$ and $\mathcal{O}(\epsilon^{-2})$.
no code implementations • ICML 2020 • Sijia Liu, Songtao Lu, Xiangyi Chen, Yao Feng, Kaidi Xu, Abdullah Al-Dujaili, Mingyi Hong, Una-May O'Reilly
In this paper, we study the problem of constrained min-max optimization in a black-box setting, where the desired optimizer cannot access the gradients of the objective function but may query its values.
no code implementations • 21 Nov 2024 • Shuchen Zhu, Boao Kong, Songtao Lu, Xinmeng Huang, Kun Yuan
To address these limitations, this paper proposes SPARKLE, a unified Single-loop Primal-dual AlgoRithm frameworK for decentraLized bilEvel optimization.
no code implementations • 3 Oct 2024 • Hongkang Li, Meng Wang, Songtao Lu, Xiaodong Cui, Pin-Yu Chen
To the best of our knowledge, this work provides the first theoretical study of training Transformers with nonlinear attention to obtain the CoT generalization capability so that the resulting model can inference on unseen tasks when the input is augmented by examples of the new task.
1 code implementation • 25 Jul 2024 • Yujia Wang, Shiqiang Wang, Songtao Lu, Jinghui Chen
Federated learning (FL) has emerged as a widely adopted training paradigm for privacy-preserving machine learning.
no code implementations • 14 Jun 2024 • Minghong Fang, Zifan Zhang, Hairi, Prashant Khanduri, Jia Liu, Songtao Lu, Yuchen Liu, Neil Gong
However, due to its fully decentralized nature, DFL is highly vulnerable to poisoning attacks, where malicious clients could manipulate the system by sending carefully-crafted local models to their neighboring clients.
no code implementations • 24 May 2024 • Shuai Zhang, Heshan Devaka Fernando, Miao Liu, Keerthiram Murugesan, Songtao Lu, Pin-Yu Chen, Tianyi Chen, Meng Wang
This paper studies the transfer reinforcement learning (RL) problem where multiple RL problems have different reward functions but share the same underlying transition dynamics.
no code implementations • 23 Feb 2024 • Hongkang Li, Meng Wang, Songtao Lu, Xiaodong Cui, Pin-Yu Chen
Despite the empirical success, the mechanics of how to train a Transformer to achieve ICL and the corresponding ICL capacity is mostly elusive due to the technical challenges of analyzing the nonconvex training problems resulting from the nonlinear self-attention and nonlinear activation in Transformers.
no code implementations • 5 Feb 2024 • Boao Kong, Shuchen Zhu, Songtao Lu, Xinmeng Huang, Kun Yuan
In this paper, we introduce a single-loop decentralized SBO (D-SOBA) algorithm and establish its transient iteration complexity, which, for the first time, clarifies the joint influence of network topology and data heterogeneity on decentralized bilevel algorithms.
1 code implementation • 13 Jan 2024 • A F M Saif, Xiaodong Cui, Han Shen, Songtao Lu, Brian Kingsbury, Tianyi Chen
In this paper, we present a novel bilevel optimization-based training approach to training acoustic models for automatic speech recognition (ASR) tasks that we term {bi-level joint unsupervised and supervised training (BL-JUST)}.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 21 Nov 2023 • Xiaodong Cui, Ashish Mittal, Songtao Lu, Wei zhang, George Saon, Brian Kingsbury
Soft random sampling (SRS) is a simple yet effective approach for efficient training of large-scale deep neural networks when dealing with massive data.
no code implementations • 27 Oct 2023 • Qiu Ji, Guilin Qi, Yuxin Ye, Jiaye Li, Site Li, Jianjie Ren, Songtao Lu
We conduct experiments over 19 ontology pairs and compare our algorithms and scoring functions with existing ones.
no code implementations • 24 Oct 2023 • Shuai Zhang, Hongkang Li, Meng Wang, Miao Liu, Pin-Yu Chen, Songtao Lu, Sijia Liu, Keerthiram Murugesan, Subhajit Chaudhury
This paper provides the first theoretical convergence and sample complexity analysis of the practical setting of DQNs with $\epsilon$-greedy policy.
no code implementations • 29 Aug 2023 • Pengwei Xing, Songtao Lu, Han Yu
Neuro-symbolic learning (NSL) models complex symbolic rule patterns into latent variable distributions by neural networks, which reduces rule search space and generates unseen rules to improve downstream task performance.
no code implementations • 26 Aug 2023 • Hui Wan, Hongkang Li, Songtao Lu, Xiaodong Cui, Marina Danilevsky
The integration of external personalized context information into document-grounded conversational systems has significant potential business value, but has not been well-studied.
no code implementations • 4 Jun 2023 • Quan Xiao, Songtao Lu, Tianyi Chen
Bilevel optimization has recently regained interest owing to its applications in emerging machine learning fields such as hyperparameter optimization, meta-learning, and reinforcement learning.
no code implementations • 5 Mar 2023 • Zhuqing Liu, Xin Zhang, Songtao Lu, Jia Liu
Decentralized min-max optimization problems with domain constraints underpins many important ML applications, including multi-agent ML fairness assurance, and policy evaluations in multi-agent reinforcement learning.
no code implementations • 6 Feb 2023 • Shuai Zhang, Meng Wang, Pin-Yu Chen, Sijia Liu, Songtao Lu, Miao Liu
Due to the significant computational challenge of training large-scale graph neural networks (GNNs), various sparse learning techniques have been exploited to reduce memory and storage costs.
no code implementations • 19 Dec 2022 • Zichong Li, Pin-Yu Chen, Sijia Liu, Songtao Lu, Yangyang Xu
In this paper, we design and analyze stochastic inexact augmented Lagrangian methods (Stoc-iALM) to solve problems involving a nonconvex composite (i. e. smooth+nonsmooth) objective and nonconvex smooth functional constraints.
no code implementations • 3 Oct 2022 • Zepeng Zhang, Songtao Lu, Zengfeng Huang, Ziping Zhao
In this work, we propose a novel interpretable message passing scheme with adaptive structure (ASMP) to defend against adversarial attacks on graph structure.
no code implementations • 27 Jul 2022 • Zhuqing Liu, Xin Zhang, Prashant Khanduri, Songtao Lu, Jia Liu
Our main contributions in this paper are two-fold: i) We first propose a deterministic algorithm called INTERACT (inner-gradient-descent-outer-tracked-gradient) that requires the sample complexity of $\mathcal{O}(n \epsilon^{-1})$ and communication complexity of $\mathcal{O}(\epsilon^{-1})$ to solve the bilevel optimization problem, where $n$ and $\epsilon > 0$ are the number of samples at each agent and the desired stationarity gap, respectively.
no code implementations • 12 Jul 2022 • Songtao Lu
The GDPA is a primal-dual algorithm, which only exploits the first-order information of both the objective and constraint functions to update the primal and dual variables in an alternating way.
no code implementations • 27 Jun 2022 • Lisha Chen, Songtao Lu, Tianyi Chen
While the conventional statistical learning theory suggests that overparameterized models tend to overfit, empirical evidence reveals that overparameterized meta learning methods still work well -- a phenomenon often called "benign overfitting."
2 code implementations • 13 Jun 2022 • Gaoyuan Zhang, Songtao Lu, Yihua Zhang, Xiangyi Chen, Pin-Yu Chen, Quanfu Fan, Lee Martie, Lior Horesh, Mingyi Hong, Sijia Liu
Spurred by that, we propose distributed adversarial training (DAT), a large-batch adversarial training framework implemented over multiple machines.
1 code implementation • 3 Mar 2022 • Alex Gu, Songtao Lu, Parikshit Ram, Lily Weng
We consider a generic min-max multi-objective bilevel optimization problem with applications in robust machine learning such as representation learning and hyperparameter optimization.
no code implementations • NeurIPS 2021 • Xin Zhang, Zhuqing Liu, Jia Liu, Zhengyuan Zhu, Songtao Lu
To our knowledge, this paper is the first work that achieves both $\mathcal{O}(\epsilon^{-2})$ sample complexity and $\mathcal{O}(\epsilon^{-2})$ communication complexity in decentralized policy evaluation for cooperative MARL.
Multi-agent Reinforcement Learning Reinforcement Learning (RL) +1
no code implementations • ICLR 2022 • FNU Hairi, Jia Liu, Songtao Lu
In this paper, we establish the first finite-time convergence result of the actor-critic algorithm for fully decentralized multi-agent reinforcement learning (MARL) problems with average reward.
Multi-agent Reinforcement Learning reinforcement-learning +2
1 code implementation • ICLR 2022 • Qi Lyu, Xiao Fu, Weiran Wang, Songtao Lu
Under this model, latent correlation maximization is shown to guarantee the extraction of the shared components across views (up to certain ambiguities).
no code implementations • 5 Jun 2021 • Yihong Dong, Ying Peng, Muqiao Yang, Songtao Lu, Qingjiang Shi
Deep neural networks have been shown as a class of useful tools for addressing signal recognition issues in recent years, especially for identifying the nonlinear feature structures of signals.
1 code implementation • 12 May 2021 • Lunchen Xie, Jiaqi Liu, Songtao Lu, Tsung-Hui Chang, Qingjiang Shi
XGBoost is one of the most widely used machine learning models in the industry due to its superior learning accuracy and efficiency.
no code implementations • NeurIPS 2020 • Chia-Yu Chen, Jiamin Ni, Songtao Lu, Xiaodong Cui, Pin-Yu Chen, Xiao Sun, Naigang Wang, Swagath Venkataramani, Vijayalakshmi Srinivasan, Wei zhang, Kailash Gopalakrishnan
Large-scale distributed training of Deep Neural Networks (DNNs) on state-of-the-art platforms is expected to be severely communication constrained.
1 code implementation • 2 Mar 2021 • Chia-Yi Hsu, Pin-Yu Chen, Songtao Lu, Sijia Liu, Chia-Mu Yu
In this paper, we propose a framework of generating adversarial examples for unsupervised models and demonstrate novel applications to data augmentation.
no code implementations • 8 Feb 2021 • Xiaodong Cui, Songtao Lu, Brian Kingsbury
In this paper, we investigate federated acoustic modeling using data from multiple clients.
Federated Learning Speech Recognition Sound Distributed, Parallel, and Cluster Computing Audio and Speech Processing
no code implementations • NeurIPS 2020 • Gang Wang, Songtao Lu, Georgios Giannakis, Gerald Tesauro, Jian Sun
The present contribution deals with decentralized policy evaluation in multi-agent Markov decision processes using temporal-difference (TD) methods with linear function approximation for scalability.
no code implementations • NeurIPS 2020 • Songtao Lu, Meisam Razaviyayn, Bo Yang, Kejun Huang, Mingyi Hong
To the best of our knowledge, this is the first time that first-order algorithms with polynomial per-iteration complexity and global sublinear rate are designed to find SOSPs of the important class of non-convex problems with linear constraints (almost surely).
1 code implementation • 25 Nov 2020 • Yunfei Teng, Anna Choromanska, Murray Campbell, Songtao Lu, Parikshit Ram, Lior Horesh
We study the principal directions of the trajectory of the optimizer after convergence and show that traveling along a few top principal directions can quickly bring the parameters outside the cone but this is not the case for the remaining directions.
no code implementations • 29 Sep 2020 • Pu Zhao, Parikshit Ram, Songtao Lu, Yuguang Yao, Djallel Bouneffouf, Xue Lin, Sijia Liu
The resulting scheme for meta-learning a UAP generator (i) has better performance (50% higher ASR) than baselines such as Projected Gradient Descent, (ii) has better performance (37% faster) than the vanilla L2O and MAML frameworks (when applicable), and (iii) is able to simultaneously handle UAP generation for different victim models and image data sources.
no code implementations • 15 Jun 2020 • Meisam Razaviyayn, Tianjian Huang, Songtao Lu, Maher Nouiehed, Maziar Sanjabi, Mingyi Hong
The min-max optimization problem, also known as the saddle point problem, is a classical optimization problem which is also studied in the context of zero-sum games.
no code implementations • 15 Jan 2020 • Tianxiang Gao, Songtao Lu, Jia Liu, Chris Chu
Further, we show that the iteration complexity of the proposed method is $O(n\varepsilon^{-2})$ to achieve $\epsilon$-stationary point, where $n$ is the number of blocks of coordinates.
no code implementations • 14 Jan 2020 • Tsung-Hui Chang, Mingyi Hong, Hoi-To Wai, Xinwei Zhang, Songtao Lu
In particular, we {provide a selective review} about the recent techniques developed for optimizing non-convex models (i. e., problem classes), processing batch and streaming data (i. e., data types), over the networks in a distributed manner (i. e., communication and computation paradigm).
no code implementations • 16 Dec 2019 • Tianxiang Gao, Songtao Lu, Jia Liu, Chris Chu
In the applications of signal processing and data analytics, there is a wide class of non-convex problems whose objective function is freed from the common global Lipschitz continuous gradient assumption (e. g., the nonnegative matrix factorization (NMF) problem).
no code implementations • 4 Dec 2019 • Songtao Lu, Yawen Zhang, Yunlong Wang, Christina Mack
Federated learning opens a number of research opportunities due to its high communication efficiency in distributed training problems within a star network.
no code implementations • 22 Oct 2019 • Zhenxun Zhuang, Yunlong Wang, Kezi Yu, Songtao Lu
The online meta-learning framework is designed for the continual lifelong learning setting.
no code implementations • 13 Oct 2019 • Haoran Sun, Songtao Lu, Mingyi Hong
Similarly, for online problems, the proposed method achieves an $\mathcal{O}(m \epsilon^{-3/2})$ sample complexity and an $\mathcal{O}(\epsilon^{-1})$ communication complexity, while the best existing bounds are $\mathcal{O}(m\epsilon^{-2})$ and $\mathcal{O}(\epsilon^{-2})$, respectively.
1 code implementation • 30 Sep 2019 • Sijia Liu, Songtao Lu, Xiangyi Chen, Yao Feng, Kaidi Xu, Abdullah Al-Dujaili, Minyi Hong, Una-May O'Reilly
In this paper, we study the problem of constrained robust (min-max) optimization ina black-box setting, where the desired optimizer cannot access the gradients of the objective function but may query its values.
no code implementations • 9 Jul 2019 • Songtao Lu, Meisam Razaviyayn, Bo Yang, Kejun Huang, Mingyi Hong
This paper proposes low-complexity algorithms for finding approximate second-order stationary points (SOSPs) of problems with smooth non-convex objective and linear constraints.
no code implementations • ICLR 2019 • Songtao Lu, Rahul Singh, Xiangyi Chen, Yongxin Chen, Mingyi Hong
By developing new primal-dual optimization tools, we show that, with a proper stepsize choice, the widely used first-order iterative algorithm in training GANs would in fact converge to a stationary solution with a sublinear rate.
no code implementations • 13 Mar 2019 • Shuai Ma, Jiahui Dai, Songtao Lu, Hang Li, Han Zhang, Chun Du, Shiyin Li
The dataset is available online, which contains eight types of modulated signals.
no code implementations • 8 Mar 2019 • Hongmei Wang, Zhenzhen Wu, Shuai Ma, Songtao Lu, Han Zhang, Guoru Ding, Shiyin Li
In this paper, we investigate deep learning (DL)-enabled signal demodulation methods and establish the first open dataset of real modulated signals for wireless communication systems.
no code implementations • 21 Feb 2019 • Songtao Lu, Ioannis Tsaknakis, Mingyi Hong, Yongxin Chen
In this work, we consider a block-wise one-sided non-convex min-max problem, in which the minimization problem consists of multiple blocks and is non-convex, while the maximization problem is (strongly) concave.
no code implementations • 18 Sep 2018 • Yongli Zhu, Songtao Lu, Renchang Dai, Guangyi Liu, Zhiwei Wang
Then the raw input and output data are preprocessed by unit scaling, and the trained network is tested on the real price data under different input lengths, forecasting horizons and data sizes.
no code implementations • 28 Feb 2018 • Songtao Lu, Mingyi Hong, Zhengdao Wang
The alternating gradient descent (AGD) is a simple but popular algorithm which has been applied to problems in optimization, machine learning, data ming, and signal processing, etc.
no code implementations • 24 Mar 2017 • Songtao Lu, Mingyi Hong, Zhengdao Wang
The proposed algorithm is guaranteed to converge to the set of Karush-Kuhn-Tucker (KKT) points of the nonconvex SymNMF problem.