Search Results for author: Songtao Lu

Found 48 papers, 8 papers with code

Improving the Sample and Communication Complexity for Decentralized Non-Convex Optimization: Joint Gradient Estimation and Tracking

no code implementations • ICML 2020 • Haoran Sun, Songtao Lu, Mingyi Hong

Similarly, for online problems, the proposed method achieves an $\mathcal{O}(m \epsilon^{-3/2})$ sample complexity and an $\mathcal{O}(\epsilon^{-1})$ communication complexity, while the best existing bounds are $\mathcal{O}(m\epsilon^{-2})$ and $\mathcal{O}(\epsilon^{-2})$.

Stochastic Optimization

Paper
Add Code

Min-Max Optimization without Gradients: Convergence and Applications to Black-Box Evasion and Poisoning Attacks

no code implementations • ICML 2020 • Sijia Liu, Songtao Lu, Xiangyi Chen, Yao Feng, Kaidi Xu, Abdullah Al-Dujaili, Mingyi Hong, Una-May O'Reilly

In this paper, we study the problem of constrained min-max optimization in a black-box setting, where the desired optimizer cannot access the gradients of the objective function but may query its values.

Paper
Add Code

Training Nonlinear Transformers for Efficient In-Context Learning: A Theoretical Learning and Generalization Analysis

no code implementations • 23 Feb 2024 • Hongkang Li, Meng Wang, Songtao Lu, Xiaodong Cui, Pin-Yu Chen

Despite the empirical success, the mechanics of how to train a Transformer to achieve ICL and the corresponding ICL capacity is mostly elusive due to the technical challenges of analyzing the nonconvex training problems resulting from the nonlinear self-attention and nonlinear activation in Transformers.

Binary Classification In-Context Learning

Paper
Add Code

Decentralized Bilevel Optimization over Graphs: Loopless Algorithmic Update and Transient Iteration Complexity

no code implementations • 5 Feb 2024 • Boao Kong, Shuchen Zhu, Songtao Lu, Xinmeng Huang, Kun Yuan

In this paper, we introduce a single-loop decentralized SBO (D-SOBA) algorithm and establish its transient iteration complexity, which, for the first time, clarifies the joint influence of network topology and data heterogeneity on decentralized bilevel algorithms.

Bilevel Optimization

Paper
Add Code

Joint Unsupervised and Supervised Training for Automatic Speech Recognition via Bilevel Optimization

1 code implementation • 13 Jan 2024 • A F M Saif, Xiaodong Cui, Han Shen, Songtao Lu, Brian Kingsbury, Tianyi Chen

In this paper, we present a novel bilevel optimization-based training approach to training acoustic models for automatic speech recognition (ASR) tasks that we term {bi-level joint unsupervised and supervised training (BL-JUST)}.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Code

Soft Random Sampling: A Theoretical and Empirical Analysis

no code implementations • 21 Nov 2023 • Xiaodong Cui, Ashish Mittal, Songtao Lu, Wei zhang, George Saon, Brian Kingsbury

Soft random sampling (SRS) is a simple yet effective approach for efficient training of large-scale deep neural networks when dealing with massive data.

Automatic Speech Recognition speech-recognition +1

Paper
Add Code

Ontology Revision based on Pre-trained Language Models

no code implementations • 27 Oct 2023 • Qiu Ji, Guilin Qi, Yuxin Ye, Jiaye Li, Site Li, Jianjie Ren, Songtao Lu

We conduct experiments over 19 ontology pairs and compare our algorithms and scoring functions with existing ones.

Paper
Add Code

On the Convergence and Sample Complexity Analysis of Deep Q-Networks with $ε$-Greedy Exploration

no code implementations • 24 Oct 2023 • Shuai Zhang, Hongkang Li, Meng Wang, Miao Liu, Pin-Yu Chen, Songtao Lu, Sijia Liu, Keerthiram Murugesan, Subhajit Chaudhury

This paper provides the first theoretical convergence and sample complexity analysis of the practical setting of DQNs with $\epsilon$-greedy policy.

Q-Learning

Paper
Add Code

FedLogic: Interpretable Federated Multi-Domain Chain-of-Thought Prompt Selection for Large Language Models

no code implementations • 29 Aug 2023 • Pengwei Xing, Songtao Lu, Han Yu

To improve interpretability and explore the balance principle between generality and personalization under a multi-domain CoT prompt selection scenario, we propose the Federated Logic rule learning approach (FedLogic).

Paper
Add Code

How Can Context Help? Exploring Joint Retrieval of Passage and Personalized Context

no code implementations • 26 Aug 2023 • Hui Wan, Hongkang Li, Songtao Lu, Xiaodong Cui, Marina Danilevsky

The integration of external personalized context information into document-grounded conversational systems has significant potential business value, but has not been well-studied.

Passage Retrieval Retrieval

Paper
Add Code

A Generalized Alternating Method for Bilevel Learning under the Polyak-Łojasiewicz Condition

no code implementations • 4 Jun 2023 • Quan Xiao, Songtao Lu, Tianyi Chen

Bilevel optimization has recently regained interest owing to its applications in emerging machine learning fields such as hyperparameter optimization, meta-learning, and reinforcement learning.

Bilevel Optimization Hyperparameter Optimization +1

Paper
Add Code

PRECISION: Decentralized Constrained Min-Max Learning with Low Communication and Sample Complexities

no code implementations • 5 Mar 2023 • Zhuqing Liu, Xin Zhang, Songtao Lu, Jia Liu

Decentralized min-max optimization problems with domain constraints underpins many important ML applications, including multi-agent ML fairness assurance, and policy evaluations in multi-agent reinforcement learning.

Fairness Multi-agent Reinforcement Learning

Paper
Add Code

Joint Edge-Model Sparse Learning is Provably Efficient for Graph Neural Networks

no code implementations • 6 Feb 2023 • Shuai Zhang, Meng Wang, Pin-Yu Chen, Sijia Liu, Songtao Lu, Miao Liu

Due to the significant computational challenge of training large-scale graph neural networks (GNNs), various sparse learning techniques have been exploited to reduce memory and storage costs.

Sparse Learning

Paper
Add Code

Stochastic Inexact Augmented Lagrangian Method for Nonconvex Expectation Constrained Optimization

no code implementations • 19 Dec 2022 • Zichong Li, Pin-Yu Chen, Sijia Liu, Songtao Lu, Yangyang Xu

In this paper, we design and analyze stochastic inexact augmented Lagrangian methods (Stoc-iALM) to solve problems involving a nonconvex composite (i. e. smooth+nonsmooth) objective and nonconvex smooth functional constraints.

Fairness

Paper
Add Code

ASGNN: Graph Neural Networks with Adaptive Structure

no code implementations • 3 Oct 2022 • Zepeng Zhang, Songtao Lu, Zengfeng Huang, Ziping Zhao

In this work, we propose a novel interpretable message passing scheme with adaptive structure (ASMP) to defend against adversarial attacks on graph structure.

Node Classification

Paper
Add Code

INTERACT: Achieving Low Sample and Communication Complexities in Decentralized Bilevel Learning over Networks

no code implementations • 27 Jul 2022 • Zhuqing Liu, Xin Zhang, Prashant Khanduri, Songtao Lu, Jia Liu

Our main contributions in this paper are two-fold: i) We first propose a deterministic algorithm called INTERACT (inner-gradient-descent-outer-tracked-gradient) that requires the sample complexity of $\mathcal{O}(n \epsilon^{-1})$ and communication complexity of $\mathcal{O}(\epsilon^{-1})$ to solve the bilevel optimization problem, where $n$ and $\epsilon > 0$ are the number of samples at each agent and the desired stationarity gap, respectively.

Bilevel Optimization Meta-Learning +1

Paper
Add Code

A Single-Loop Gradient Descent and Perturbed Ascent Algorithm for Nonconvex Functional Constrained Optimization

no code implementations • 12 Jul 2022 • Songtao Lu

The GDPA is a primal-dual algorithm, which only exploits the first-order information of both the objective and constraint functions to update the primal and dual variables in an alternating way.

Paper
Add Code

Understanding Benign Overfitting in Gradient-Based Meta Learning

no code implementations • 27 Jun 2022 • Lisha Chen, Songtao Lu, Tianyi Chen

While the conventional statistical learning theory suggests that overparameterized models tend to overfit, empirical evidence reveals that overparameterized meta learning methods still work well -- a phenomenon often called "benign overfitting."

Few-Shot Learning Learning Theory

Paper
Add Code

Distributed Adversarial Training to Robustify Deep Neural Networks at Scale

2 code implementations • 13 Jun 2022 • Gaoyuan Zhang, Songtao Lu, Yihua Zhang, Xiangyi Chen, Pin-Yu Chen, Quanfu Fan, Lee Martie, Lior Horesh, Mingyi Hong, Sijia Liu

Spurred by that, we propose distributed adversarial training (DAT), a large-batch adversarial training framework implemented over multiple machines.

Distributed Optimization

Paper
Code

Min-Max Bilevel Multi-objective Optimization with Applications in Machine Learning

1 code implementation • 3 Mar 2022 • Alex Gu, Songtao Lu, Parikshit Ram, Lily Weng

We consider a generic min-max multi-objective bilevel optimization problem with applications in robust machine learning such as representation learning and hyperparameter optimization.

BIG-bench Machine Learning Bilevel Optimization +4

Paper
Code

Taming Communication and Sample Complexities in Decentralized Policy Evaluation for Cooperative Multi-Agent Reinforcement Learning

no code implementations • NeurIPS 2021 • Xin Zhang, Zhuqing Liu, Jia Liu, Zhengyuan Zhu, Songtao Lu

To our knowledge, this paper is the first work that achieves both $\mathcal{O}(\epsilon^{-2})$ sample complexity and $\mathcal{O}(\epsilon^{-2})$ communication complexity in decentralized policy evaluation for cooperative MARL.

Multi-agent Reinforcement Learning Reinforcement Learning (RL) +1

Paper
Add Code

Finite-Time Convergence and Sample Complexity of Multi-Agent Actor-Critic Reinforcement Learning with Average Reward

no code implementations • ICLR 2022 • FNU Hairi, Jia Liu, Songtao Lu

In this paper, we establish the first finite-time convergence result of the actor-critic algorithm for fully decentralized multi-agent reinforcement learning (MARL) problems with average reward.

Multi-agent Reinforcement Learning reinforcement-learning +1

Paper
Add Code

Understanding Latent Correlation-Based Multiview Learning and Self-Supervision: An Identifiability Perspective

1 code implementation • ICLR 2022 • Qi Lyu, Xiao Fu, Weiran Wang, Songtao Lu

Under this model, latent correlation maximization is shown to guarantee the extraction of the shared components across views (up to certain ambiguities).

Clustering Disentanglement +2

Paper
Code

Signal Transformer: Complex-valued Attention and Meta-Learning for Signal Recognition

no code implementations • 5 Jun 2021 • Yihong Dong, Ying Peng, Muqiao Yang, Songtao Lu, Qingjiang Shi

Deep neural networks have been shown as a class of useful tools for addressing signal recognition issues in recent years, especially for identifying the nonlinear feature structures of signals.

Meta-Learning Time Series +1

Paper
Add Code

An Efficient Learning Framework For Federated XGBoost Using Secret Sharing And Distributed Optimization

1 code implementation • 12 May 2021 • Lunchen Xie, Jiaqi Liu, Songtao Lu, Tsung-Hui Chang, Qingjiang Shi

XGBoost is one of the most widely used machine learning models in the industry due to its superior learning accuracy and efficiency.

Distributed Optimization

Paper
Code

ScaleCom: Scalable Sparsified Gradient Compression for Communication-Efficient Distributed Training

no code implementations • NeurIPS 2020 • Chia-Yu Chen, Jiamin Ni, Songtao Lu, Xiaodong Cui, Pin-Yu Chen, Xiao Sun, Naigang Wang, Swagath Venkataramani, Vijayalakshmi Srinivasan, Wei zhang, Kailash Gopalakrishnan

Large-scale distributed training of Deep Neural Networks (DNNs) on state-of-the-art platforms is expected to be severely communication constrained.

Paper
Add Code

Adversarial Examples can be Effective Data Augmentation for Unsupervised Machine Learning

1 code implementation • 2 Mar 2021 • Chia-Yi Hsu, Pin-Yu Chen, Songtao Lu, Sijia Liu, Chia-Mu Yu

In this paper, we propose a framework of generating adversarial examples for unsupervised models and demonstrate novel applications to data augmentation.

BIG-bench Machine Learning Contrastive Learning +2

Paper
Code

Federated Acoustic Modeling For Automatic Speech Recognition

no code implementations • 8 Feb 2021 • Xiaodong Cui, Songtao Lu, Brian Kingsbury

In this paper, we investigate federated acoustic modeling using data from multiple clients.

Federated Learning Speech Recognition Sound Distributed, Parallel, and Cluster Computing Audio and Speech Processing

Paper
Add Code

Finding Second-Order Stationary Points Efficiently in Smooth Nonconvex Linearly Constrained Optimization Problems

no code implementations • NeurIPS 2020 • Songtao Lu, Meisam Razaviyayn, Bo Yang, Kejun Huang, Mingyi Hong

To the best of our knowledge, this is the first time that first-order algorithms with polynomial per-iteration complexity and global sublinear rate are designed to find SOSPs of the important class of non-convex problems with linear constraints (almost surely).

Paper
Add Code

Decentralized TD Tracking with Linear Function Approximation and its Finite-Time Analysis

no code implementations • NeurIPS 2020 • Gang Wang, Songtao Lu, Georgios Giannakis, Gerald Tesauro, Jian Sun

The present contribution deals with decentralized policy evaluation in multi-agent Markov decision processes using temporal-difference (TD) methods with linear function approximation for scalability.

Paper
Add Code

Overcoming Catastrophic Forgetting via Direction-Constrained Optimization

1 code implementation • 25 Nov 2020 • Yunfei Teng, Anna Choromanska, Murray Campbell, Songtao Lu, Parikshit Ram, Lior Horesh

We study the principal directions of the trajectory of the optimizer after convergence and show that traveling along a few top principal directions can quickly bring the parameters outside the cone but this is not the case for the remaining directions.

Continual Learning

Paper
Code

Learning to Generate Image Source-Agnostic Universal Adversarial Perturbations

no code implementations • 29 Sep 2020 • Pu Zhao, Parikshit Ram, Songtao Lu, Yuguang Yao, Djallel Bouneffouf, Xue Lin, Sijia Liu

The resulting scheme for meta-learning a UAP generator (i) has better performance (50% higher ASR) than baselines such as Projected Gradient Descent, (ii) has better performance (37% faster) than the vanilla L2O and MAML frameworks (when applicable), and (iii) is able to simultaneously handle UAP generation for different victim models and image data sources.

Adversarial Attack Bilevel Optimization +1

Paper
Add Code

Non-convex Min-Max Optimization: Applications, Challenges, and Recent Theoretical Advances

no code implementations • 15 Jun 2020 • Meisam Razaviyayn, Tianjian Huang, Songtao Lu, Maher Nouiehed, Maziar Sanjabi, Mingyi Hong

The min-max optimization problem, also known as the saddle point problem, is a classical optimization problem which is also studied in the context of zero-sum games.

Paper
Add Code

Randomized Bregman Coordinate Descent Methods for Non-Lipschitz Optimization

no code implementations • 15 Jan 2020 • Tianxiang Gao, Songtao Lu, Jia Liu, Chris Chu

Further, we show that the iteration complexity of the proposed method is $O(n\varepsilon^{-2})$ to achieve $\epsilon$-stationary point, where $n$ is the number of blocks of coordinates.

Translation

Paper
Add Code

Distributed Learning in the Non-Convex World: From Batch to Streaming Data, and Beyond

no code implementations • 14 Jan 2020 • Tsung-Hui Chang, Mingyi Hong, Hoi-To Wai, Xinwei Zhang, Songtao Lu

In particular, we {provide a selective review} about the recent techniques developed for optimizing non-convex models (i. e., problem classes), processing batch and streaming data (i. e., data types), over the networks in a distributed manner (i. e., communication and computation paradigm).

Paper
Add Code

Leveraging Two Reference Functions in Block Bregman Proximal Gradient Descent for Non-convex and Non-Lipschitz Problems

no code implementations • 16 Dec 2019 • Tianxiang Gao, Songtao Lu, Jia Liu, Chris Chu

In the applications of signal processing and data analytics, there is a wide class of non-convex problems whose objective function is freed from the common global Lipschitz continuous gradient assumption (e. g., the nonnegative matrix factorization (NMF) problem).

Paper
Add Code

Learn Electronic Health Records by Fully Decentralized Federated Learning

no code implementations • 4 Dec 2019 • Songtao Lu, Yawen Zhang, Yunlong Wang, Christina Mack

Federated learning opens a number of research opportunities due to its high communication efficiency in distributed training problems within a star network.

Federated Learning

Paper
Add Code

No-regret Non-convex Online Meta-Learning

no code implementations • 22 Oct 2019 • Zhenxun Zhuang, Yunlong Wang, Kezi Yu, Songtao Lu

The online meta-learning framework is designed for the continual lifelong learning setting.

Meta-Learning

Paper
Add Code

Improving the Sample and Communication Complexity for Decentralized Non-Convex Optimization: A Joint Gradient Estimation and Tracking Approach

no code implementations • 13 Oct 2019 • Haoran Sun, Songtao Lu, Mingyi Hong

Stochastic Optimization

Paper
Add Code

Min-Max Optimization without Gradients: Convergence and Applications to Adversarial ML

1 code implementation • 30 Sep 2019 • Sijia Liu, Songtao Lu, Xiangyi Chen, Yao Feng, Kaidi Xu, Abdullah Al-Dujaili, Minyi Hong, Una-May O'Reilly

In this paper, we study the problem of constrained robust (min-max) optimization ina black-box setting, where the desired optimizer cannot access the gradients of the objective function but may query its values.

Paper
Code

SNAP: Finding Approximate Second-Order Stationary Solutions Efficiently for Non-convex Linearly Constrained Problems

no code implementations • 9 Jul 2019 • Songtao Lu, Meisam Razaviyayn, Bo Yang, Kejun Huang, Mingyi Hong

This paper proposes low-complexity algorithms for finding approximate second-order stationary points (SOSPs) of problems with smooth non-convex objective and linear constraints.

Paper
Add Code

Understand the dynamics of GANs via Primal-Dual Optimization

no code implementations • ICLR 2019 • Songtao Lu, Rahul Singh, Xiangyi Chen, Yongxin Chen, Mingyi Hong

By developing new primal-dual optimization tools, we show that, with a proper stepsize choice, the widely used first-order iterative algorithm in training GANs would in fact converge to a stationary solution with a sublinear rate.

Generative Adversarial Network Multi-Task Learning

Paper
Add Code

Signal Demodulation with Machine Learning Methods for Physical Layer Visible Light Communications: Prototype Platform, Open Dataset and Algorithms

no code implementations • 13 Mar 2019 • Shuai Ma, Jiahui Dai, Songtao Lu, Hang Li, Han Zhang, Chun Du, Shiyin Li

The dataset is available online, which contains eight types of modulated signals.

Image Classification

Paper
Add Code

Deep Learning for Signal Demodulation in Physical Layer Wireless Communications: Prototype Platform, Open Dataset, and Analytics

no code implementations • 8 Mar 2019 • Hongmei Wang, Zhenzhen Wu, Shuai Ma, Songtao Lu, Han Zhang, Guoru Ding, Shiyin Li

In this paper, we investigate deep learning (DL)-enabled signal demodulation methods and establish the first open dataset of real modulated signals for wireless communication systems.

Paper
Add Code

Hybrid Block Successive Approximation for One-Sided Non-Convex Min-Max Problems: Algorithms and Applications

no code implementations • 21 Feb 2019 • Songtao Lu, Ioannis Tsaknakis, Mingyi Hong, Yongxin Chen

In this work, we consider a block-wise one-sided non-convex min-max problem, in which the minimization problem consists of multiple blocks and is non-convex, while the maximization problem is (strongly) concave.

Paper
Add Code

Power Market Price Forecasting via Deep Learning

no code implementations • 18 Sep 2018 • Yongli Zhu, Songtao Lu, Renchang Dai, Guangyi Liu, Zhiwei Wang

Then the raw input and output data are preprocessed by unit scaling, and the trained network is tested on the real price data under different input lengths, forecasting horizons and data sizes.

Paper
Add Code

On the Sublinear Convergence of Randomly Perturbed Alternating Gradient Descent to Second Order Stationary Solutions

no code implementations • 28 Feb 2018 • Songtao Lu, Mingyi Hong, Zhengdao Wang

The alternating gradient descent (AGD) is a simple but popular algorithm which has been applied to problems in optimization, machine learning, data ming, and signal processing, etc.

Paper
Add Code

A Nonconvex Splitting Method for Symmetric Nonnegative Matrix Factorization: Convergence Analysis and Optimality

no code implementations • 24 Mar 2017 • Songtao Lu, Mingyi Hong, Zhengdao Wang

The proposed algorithm is guaranteed to converge to the set of Karush-Kuhn-Tucker (KKT) points of the nonconvex SymNMF problem.

Clustering Community Detection +2

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.