Search Results for author: Songtao Lu

Found 53 papers, 9 papers with code

Improving the Sample and Communication Complexity for Decentralized Non-Convex Optimization: Joint Gradient Estimation and Tracking

no code implementations ICML 2020 Haoran Sun, Songtao Lu, Mingyi Hong

Similarly, for online problems, the proposed method achieves an $\mathcal{O}(m \epsilon^{-3/2})$ sample complexity and an $\mathcal{O}(\epsilon^{-1})$ communication complexity, while the best existing bounds are $\mathcal{O}(m\epsilon^{-2})$ and $\mathcal{O}(\epsilon^{-2})$.

Stochastic Optimization

Min-Max Optimization without Gradients: Convergence and Applications to Black-Box Evasion and Poisoning Attacks

no code implementations ICML 2020 Sijia Liu, Songtao Lu, Xiangyi Chen, Yao Feng, Kaidi Xu, Abdullah Al-Dujaili, Mingyi Hong, Una-May O'Reilly

In this paper, we study the problem of constrained min-max optimization in a black-box setting, where the desired optimizer cannot access the gradients of the objective function but may query its values.

SPARKLE: A Unified Single-Loop Primal-Dual Framework for Decentralized Bilevel Optimization

no code implementations21 Nov 2024 Shuchen Zhu, Boao Kong, Songtao Lu, Xinmeng Huang, Kun Yuan

To address these limitations, this paper proposes SPARKLE, a unified Single-loop Primal-dual AlgoRithm frameworK for decentraLized bilEvel optimization.

Bilevel Optimization

Training Nonlinear Transformers for Chain-of-Thought Inference: A Theoretical Generalization Analysis

no code implementations3 Oct 2024 Hongkang Li, Meng Wang, Songtao Lu, Xiaodong Cui, Pin-Yu Chen

To the best of our knowledge, this work provides the first theoretical study of training Transformers with nonlinear attention to obtain the CoT generalization capability so that the resulting model can inference on unseen tasks when the input is augmented by examples of the new task.

In-Context Learning

FADAS: Towards Federated Adaptive Asynchronous Optimization

1 code implementation25 Jul 2024 Yujia Wang, Shiqiang Wang, Songtao Lu, Jinghui Chen

Federated learning (FL) has emerged as a widely adopted training paradigm for privacy-preserving machine learning.

Federated Learning Privacy Preserving

Byzantine-Robust Decentralized Federated Learning

no code implementations14 Jun 2024 Minghong Fang, Zifan Zhang, Hairi, Prashant Khanduri, Jia Liu, Songtao Lu, Yuchen Liu, Neil Gong

However, due to its fully decentralized nature, DFL is highly vulnerable to poisoning attacks, where malicious clients could manipulate the system by sending carefully-crafted local models to their neighboring clients.

Federated Learning

SF-DQN: Provable Knowledge Transfer using Successor Feature for Deep Reinforcement Learning

no code implementations24 May 2024 Shuai Zhang, Heshan Devaka Fernando, Miao Liu, Keerthiram Murugesan, Songtao Lu, Pin-Yu Chen, Tianyi Chen, Meng Wang

This paper studies the transfer reinforcement learning (RL) problem where multiple RL problems have different reward functions but share the same underlying transition dynamics.

Deep Reinforcement Learning Q-Learning +2

How Do Nonlinear Transformers Learn and Generalize in In-Context Learning?

no code implementations23 Feb 2024 Hongkang Li, Meng Wang, Songtao Lu, Xiaodong Cui, Pin-Yu Chen

Despite the empirical success, the mechanics of how to train a Transformer to achieve ICL and the corresponding ICL capacity is mostly elusive due to the technical challenges of analyzing the nonconvex training problems resulting from the nonlinear self-attention and nonlinear activation in Transformers.

Binary Classification In-Context Learning

Decentralized Bilevel Optimization over Graphs: Loopless Algorithmic Update and Transient Iteration Complexity

no code implementations5 Feb 2024 Boao Kong, Shuchen Zhu, Songtao Lu, Xinmeng Huang, Kun Yuan

In this paper, we introduce a single-loop decentralized SBO (D-SOBA) algorithm and establish its transient iteration complexity, which, for the first time, clarifies the joint influence of network topology and data heterogeneity on decentralized bilevel algorithms.

Bilevel Optimization

Joint Unsupervised and Supervised Training for Automatic Speech Recognition via Bilevel Optimization

1 code implementation13 Jan 2024 A F M Saif, Xiaodong Cui, Han Shen, Songtao Lu, Brian Kingsbury, Tianyi Chen

In this paper, we present a novel bilevel optimization-based training approach to training acoustic models for automatic speech recognition (ASR) tasks that we term {bi-level joint unsupervised and supervised training (BL-JUST)}.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Soft Random Sampling: A Theoretical and Empirical Analysis

no code implementations21 Nov 2023 Xiaodong Cui, Ashish Mittal, Songtao Lu, Wei zhang, George Saon, Brian Kingsbury

Soft random sampling (SRS) is a simple yet effective approach for efficient training of large-scale deep neural networks when dealing with massive data.

Automatic Speech Recognition speech-recognition +1

Ontology Revision based on Pre-trained Language Models

no code implementations27 Oct 2023 Qiu Ji, Guilin Qi, Yuxin Ye, Jiaye Li, Site Li, Jianjie Ren, Songtao Lu

We conduct experiments over 19 ontology pairs and compare our algorithms and scoring functions with existing ones.

Federated Neuro-Symbolic Learning

no code implementations29 Aug 2023 Pengwei Xing, Songtao Lu, Han Yu

Neuro-symbolic learning (NSL) models complex symbolic rule patterns into latent variable distributions by neural networks, which reduces rule search space and generates unseen rules to improve downstream task performance.

Bilevel Optimization Federated Learning

How Can Context Help? Exploring Joint Retrieval of Passage and Personalized Context

no code implementations26 Aug 2023 Hui Wan, Hongkang Li, Songtao Lu, Xiaodong Cui, Marina Danilevsky

The integration of external personalized context information into document-grounded conversational systems has significant potential business value, but has not been well-studied.

Passage Retrieval Retrieval

A Generalized Alternating Method for Bilevel Learning under the Polyak-Łojasiewicz Condition

no code implementations4 Jun 2023 Quan Xiao, Songtao Lu, Tianyi Chen

Bilevel optimization has recently regained interest owing to its applications in emerging machine learning fields such as hyperparameter optimization, meta-learning, and reinforcement learning.

Bilevel Optimization Hyperparameter Optimization +1

PRECISION: Decentralized Constrained Min-Max Learning with Low Communication and Sample Complexities

no code implementations5 Mar 2023 Zhuqing Liu, Xin Zhang, Songtao Lu, Jia Liu

Decentralized min-max optimization problems with domain constraints underpins many important ML applications, including multi-agent ML fairness assurance, and policy evaluations in multi-agent reinforcement learning.

Fairness Multi-agent Reinforcement Learning

Joint Edge-Model Sparse Learning is Provably Efficient for Graph Neural Networks

no code implementations6 Feb 2023 Shuai Zhang, Meng Wang, Pin-Yu Chen, Sijia Liu, Songtao Lu, Miao Liu

Due to the significant computational challenge of training large-scale graph neural networks (GNNs), various sparse learning techniques have been exploited to reduce memory and storage costs.

Sparse Learning

Stochastic Inexact Augmented Lagrangian Method for Nonconvex Expectation Constrained Optimization

no code implementations19 Dec 2022 Zichong Li, Pin-Yu Chen, Sijia Liu, Songtao Lu, Yangyang Xu

In this paper, we design and analyze stochastic inexact augmented Lagrangian methods (Stoc-iALM) to solve problems involving a nonconvex composite (i. e. smooth+nonsmooth) objective and nonconvex smooth functional constraints.

Fairness

ASGNN: Graph Neural Networks with Adaptive Structure

no code implementations3 Oct 2022 Zepeng Zhang, Songtao Lu, Zengfeng Huang, Ziping Zhao

In this work, we propose a novel interpretable message passing scheme with adaptive structure (ASMP) to defend against adversarial attacks on graph structure.

Graph Neural Network Node Classification

INTERACT: Achieving Low Sample and Communication Complexities in Decentralized Bilevel Learning over Networks

no code implementations27 Jul 2022 Zhuqing Liu, Xin Zhang, Prashant Khanduri, Songtao Lu, Jia Liu

Our main contributions in this paper are two-fold: i) We first propose a deterministic algorithm called INTERACT (inner-gradient-descent-outer-tracked-gradient) that requires the sample complexity of $\mathcal{O}(n \epsilon^{-1})$ and communication complexity of $\mathcal{O}(\epsilon^{-1})$ to solve the bilevel optimization problem, where $n$ and $\epsilon > 0$ are the number of samples at each agent and the desired stationarity gap, respectively.

Bilevel Optimization Meta-Learning +1

A Single-Loop Gradient Descent and Perturbed Ascent Algorithm for Nonconvex Functional Constrained Optimization

no code implementations12 Jul 2022 Songtao Lu

The GDPA is a primal-dual algorithm, which only exploits the first-order information of both the objective and constraint functions to update the primal and dual variables in an alternating way.

Understanding Benign Overfitting in Gradient-Based Meta Learning

no code implementations27 Jun 2022 Lisha Chen, Songtao Lu, Tianyi Chen

While the conventional statistical learning theory suggests that overparameterized models tend to overfit, empirical evidence reveals that overparameterized meta learning methods still work well -- a phenomenon often called "benign overfitting."

Few-Shot Learning Learning Theory

Distributed Adversarial Training to Robustify Deep Neural Networks at Scale

2 code implementations13 Jun 2022 Gaoyuan Zhang, Songtao Lu, Yihua Zhang, Xiangyi Chen, Pin-Yu Chen, Quanfu Fan, Lee Martie, Lior Horesh, Mingyi Hong, Sijia Liu

Spurred by that, we propose distributed adversarial training (DAT), a large-batch adversarial training framework implemented over multiple machines.

Distributed Optimization

Min-Max Bilevel Multi-objective Optimization with Applications in Machine Learning

1 code implementation3 Mar 2022 Alex Gu, Songtao Lu, Parikshit Ram, Lily Weng

We consider a generic min-max multi-objective bilevel optimization problem with applications in robust machine learning such as representation learning and hyperparameter optimization.

BIG-bench Machine Learning Bilevel Optimization +4

Taming Communication and Sample Complexities in Decentralized Policy Evaluation for Cooperative Multi-Agent Reinforcement Learning

no code implementations NeurIPS 2021 Xin Zhang, Zhuqing Liu, Jia Liu, Zhengyuan Zhu, Songtao Lu

To our knowledge, this paper is the first work that achieves both $\mathcal{O}(\epsilon^{-2})$ sample complexity and $\mathcal{O}(\epsilon^{-2})$ communication complexity in decentralized policy evaluation for cooperative MARL.

Multi-agent Reinforcement Learning Reinforcement Learning (RL) +1

Finite-Time Convergence and Sample Complexity of Multi-Agent Actor-Critic Reinforcement Learning with Average Reward

no code implementations ICLR 2022 FNU Hairi, Jia Liu, Songtao Lu

In this paper, we establish the first finite-time convergence result of the actor-critic algorithm for fully decentralized multi-agent reinforcement learning (MARL) problems with average reward.

Multi-agent Reinforcement Learning reinforcement-learning +2

Understanding Latent Correlation-Based Multiview Learning and Self-Supervision: An Identifiability Perspective

1 code implementation ICLR 2022 Qi Lyu, Xiao Fu, Weiran Wang, Songtao Lu

Under this model, latent correlation maximization is shown to guarantee the extraction of the shared components across views (up to certain ambiguities).

Clustering Disentanglement +2

Signal Transformer: Complex-valued Attention and Meta-Learning for Signal Recognition

no code implementations5 Jun 2021 Yihong Dong, Ying Peng, Muqiao Yang, Songtao Lu, Qingjiang Shi

Deep neural networks have been shown as a class of useful tools for addressing signal recognition issues in recent years, especially for identifying the nonlinear feature structures of signals.

Meta-Learning Time Series +1

An Efficient Learning Framework For Federated XGBoost Using Secret Sharing And Distributed Optimization

1 code implementation12 May 2021 Lunchen Xie, Jiaqi Liu, Songtao Lu, Tsung-Hui Chang, Qingjiang Shi

XGBoost is one of the most widely used machine learning models in the industry due to its superior learning accuracy and efficiency.

Distributed Optimization

Adversarial Examples can be Effective Data Augmentation for Unsupervised Machine Learning

1 code implementation2 Mar 2021 Chia-Yi Hsu, Pin-Yu Chen, Songtao Lu, Sijia Liu, Chia-Mu Yu

In this paper, we propose a framework of generating adversarial examples for unsupervised models and demonstrate novel applications to data augmentation.

BIG-bench Machine Learning Contrastive Learning +2

Federated Acoustic Modeling For Automatic Speech Recognition

no code implementations8 Feb 2021 Xiaodong Cui, Songtao Lu, Brian Kingsbury

In this paper, we investigate federated acoustic modeling using data from multiple clients.

Federated Learning Speech Recognition Sound Distributed, Parallel, and Cluster Computing Audio and Speech Processing

Decentralized TD Tracking with Linear Function Approximation and its Finite-Time Analysis

no code implementations NeurIPS 2020 Gang Wang, Songtao Lu, Georgios Giannakis, Gerald Tesauro, Jian Sun

The present contribution deals with decentralized policy evaluation in multi-agent Markov decision processes using temporal-difference (TD) methods with linear function approximation for scalability.

Finding Second-Order Stationary Points Efficiently in Smooth Nonconvex Linearly Constrained Optimization Problems

no code implementations NeurIPS 2020 Songtao Lu, Meisam Razaviyayn, Bo Yang, Kejun Huang, Mingyi Hong

To the best of our knowledge, this is the first time that first-order algorithms with polynomial per-iteration complexity and global sublinear rate are designed to find SOSPs of the important class of non-convex problems with linear constraints (almost surely).

Overcoming Catastrophic Forgetting via Direction-Constrained Optimization

1 code implementation25 Nov 2020 Yunfei Teng, Anna Choromanska, Murray Campbell, Songtao Lu, Parikshit Ram, Lior Horesh

We study the principal directions of the trajectory of the optimizer after convergence and show that traveling along a few top principal directions can quickly bring the parameters outside the cone but this is not the case for the remaining directions.

Continual Learning

Learning to Generate Image Source-Agnostic Universal Adversarial Perturbations

no code implementations29 Sep 2020 Pu Zhao, Parikshit Ram, Songtao Lu, Yuguang Yao, Djallel Bouneffouf, Xue Lin, Sijia Liu

The resulting scheme for meta-learning a UAP generator (i) has better performance (50% higher ASR) than baselines such as Projected Gradient Descent, (ii) has better performance (37% faster) than the vanilla L2O and MAML frameworks (when applicable), and (iii) is able to simultaneously handle UAP generation for different victim models and image data sources.

Adversarial Attack Bilevel Optimization +1

Non-convex Min-Max Optimization: Applications, Challenges, and Recent Theoretical Advances

no code implementations15 Jun 2020 Meisam Razaviyayn, Tianjian Huang, Songtao Lu, Maher Nouiehed, Maziar Sanjabi, Mingyi Hong

The min-max optimization problem, also known as the saddle point problem, is a classical optimization problem which is also studied in the context of zero-sum games.

Randomized Bregman Coordinate Descent Methods for Non-Lipschitz Optimization

no code implementations15 Jan 2020 Tianxiang Gao, Songtao Lu, Jia Liu, Chris Chu

Further, we show that the iteration complexity of the proposed method is $O(n\varepsilon^{-2})$ to achieve $\epsilon$-stationary point, where $n$ is the number of blocks of coordinates.

Translation

Distributed Learning in the Non-Convex World: From Batch to Streaming Data, and Beyond

no code implementations14 Jan 2020 Tsung-Hui Chang, Mingyi Hong, Hoi-To Wai, Xinwei Zhang, Songtao Lu

In particular, we {provide a selective review} about the recent techniques developed for optimizing non-convex models (i. e., problem classes), processing batch and streaming data (i. e., data types), over the networks in a distributed manner (i. e., communication and computation paradigm).

Leveraging Two Reference Functions in Block Bregman Proximal Gradient Descent for Non-convex and Non-Lipschitz Problems

no code implementations16 Dec 2019 Tianxiang Gao, Songtao Lu, Jia Liu, Chris Chu

In the applications of signal processing and data analytics, there is a wide class of non-convex problems whose objective function is freed from the common global Lipschitz continuous gradient assumption (e. g., the nonnegative matrix factorization (NMF) problem).

Learn Electronic Health Records by Fully Decentralized Federated Learning

no code implementations4 Dec 2019 Songtao Lu, Yawen Zhang, Yunlong Wang, Christina Mack

Federated learning opens a number of research opportunities due to its high communication efficiency in distributed training problems within a star network.

Federated Learning

No-regret Non-convex Online Meta-Learning

no code implementations22 Oct 2019 Zhenxun Zhuang, Yunlong Wang, Kezi Yu, Songtao Lu

The online meta-learning framework is designed for the continual lifelong learning setting.

Meta-Learning

Improving the Sample and Communication Complexity for Decentralized Non-Convex Optimization: A Joint Gradient Estimation and Tracking Approach

no code implementations13 Oct 2019 Haoran Sun, Songtao Lu, Mingyi Hong

Similarly, for online problems, the proposed method achieves an $\mathcal{O}(m \epsilon^{-3/2})$ sample complexity and an $\mathcal{O}(\epsilon^{-1})$ communication complexity, while the best existing bounds are $\mathcal{O}(m\epsilon^{-2})$ and $\mathcal{O}(\epsilon^{-2})$, respectively.

Stochastic Optimization

Min-Max Optimization without Gradients: Convergence and Applications to Adversarial ML

1 code implementation30 Sep 2019 Sijia Liu, Songtao Lu, Xiangyi Chen, Yao Feng, Kaidi Xu, Abdullah Al-Dujaili, Minyi Hong, Una-May O'Reilly

In this paper, we study the problem of constrained robust (min-max) optimization ina black-box setting, where the desired optimizer cannot access the gradients of the objective function but may query its values.

SNAP: Finding Approximate Second-Order Stationary Solutions Efficiently for Non-convex Linearly Constrained Problems

no code implementations9 Jul 2019 Songtao Lu, Meisam Razaviyayn, Bo Yang, Kejun Huang, Mingyi Hong

This paper proposes low-complexity algorithms for finding approximate second-order stationary points (SOSPs) of problems with smooth non-convex objective and linear constraints.

Understand the dynamics of GANs via Primal-Dual Optimization

no code implementations ICLR 2019 Songtao Lu, Rahul Singh, Xiangyi Chen, Yongxin Chen, Mingyi Hong

By developing new primal-dual optimization tools, we show that, with a proper stepsize choice, the widely used first-order iterative algorithm in training GANs would in fact converge to a stationary solution with a sublinear rate.

Generative Adversarial Network Multi-Task Learning

Deep Learning for Signal Demodulation in Physical Layer Wireless Communications: Prototype Platform, Open Dataset, and Analytics

no code implementations8 Mar 2019 Hongmei Wang, Zhenzhen Wu, Shuai Ma, Songtao Lu, Han Zhang, Guoru Ding, Shiyin Li

In this paper, we investigate deep learning (DL)-enabled signal demodulation methods and establish the first open dataset of real modulated signals for wireless communication systems.

Hybrid Block Successive Approximation for One-Sided Non-Convex Min-Max Problems: Algorithms and Applications

no code implementations21 Feb 2019 Songtao Lu, Ioannis Tsaknakis, Mingyi Hong, Yongxin Chen

In this work, we consider a block-wise one-sided non-convex min-max problem, in which the minimization problem consists of multiple blocks and is non-convex, while the maximization problem is (strongly) concave.

Power Market Price Forecasting via Deep Learning

no code implementations18 Sep 2018 Yongli Zhu, Songtao Lu, Renchang Dai, Guangyi Liu, Zhiwei Wang

Then the raw input and output data are preprocessed by unit scaling, and the trained network is tested on the real price data under different input lengths, forecasting horizons and data sizes.

Deep Learning

On the Sublinear Convergence of Randomly Perturbed Alternating Gradient Descent to Second Order Stationary Solutions

no code implementations28 Feb 2018 Songtao Lu, Mingyi Hong, Zhengdao Wang

The alternating gradient descent (AGD) is a simple but popular algorithm which has been applied to problems in optimization, machine learning, data ming, and signal processing, etc.

A Nonconvex Splitting Method for Symmetric Nonnegative Matrix Factorization: Convergence Analysis and Optimality

no code implementations24 Mar 2017 Songtao Lu, Mingyi Hong, Zhengdao Wang

The proposed algorithm is guaranteed to converge to the set of Karush-Kuhn-Tucker (KKT) points of the nonconvex SymNMF problem.

Clustering Community Detection +2

Cannot find the paper you are looking for? You can Submit a new open access paper.