Search Results for author: Tan M. Nguyen

Found 24 papers, 13 papers with code

MoLEx: Mixture of Layer Experts for Finetuning with Sparse Upcycling

1 code implementation14 Mar 2025 Rachel S. Y. Teo, Tan M. Nguyen

We then propose the Mixture of Layer Experts (MoLEx), a novel sparse mixture of experts (SMoE) whose experts are layers in the pre-trained model.

Mixture-of-Experts parameter-efficient fine-tuning +1

Distance-Based Tree-Sliced Wasserstein Distance

1 code implementation14 Mar 2025 Hoang V. Tran, Khoi N. M. Nguyen, Trang Pham, Thanh T. Chu, Tam Le, Tan M. Nguyen

However, projecting measures onto low-dimensional spaces can lead to a loss of topological information.

Computational Efficiency

Spherical Tree-Sliced Wasserstein Distance

1 code implementation14 Mar 2025 Hoang V. Tran, Thanh T. Chu, Khoi N. M. Nguyen, Trang Pham, Tam Le, Tan M. Nguyen

Inspired by this approach, in this paper, we present an adaptation of tree systems on OT problems for measures supported on a sphere.

Self-Supervised Learning

CAMEx: Curvature-aware Merging of Experts

1 code implementation26 Feb 2025 Dung V. Nguyen, Minh H. Nguyen, Luc Q. Nguyen, Rachel S. Y. Teo, Tan M. Nguyen, Linh Duy Tran

In this paper, we introduce CAMEx (\textbf{C}urvature-\textbf{A}ware \textbf{M}erging of \textbf{Ex}perts), a novel expert merging protocol that incorporates natural gradients to account for the non-Euclidean curvature of the parameter manifold.

Tight Clusters Make Specialized Experts

1 code implementation21 Feb 2025 Stefan K. Nielsen, Rachel S. Y. Teo, Laziz U. Abdullaev, Tan M. Nguyen

Our AC router enables the MoE model to obtain three connected benefits: 1) faster convergence, 2) better robustness to data corruption, and 3) overall performance improvement, as experts are specialized in semantically distinct regions of the input space.

Clustering Language Modeling +2

An Attention-based Framework for Fair Contrastive Learning

no code implementations22 Nov 2024 Stefan K. Nielsen, Tan M. Nguyen

Contrastive learning has proven instrumental in learning unbiased representations of data, especially in complex environments characterized by high-cardinality and high-dimensional sensitive information.

Contrastive Learning

MomentumSMoE: Integrating Momentum into Sparse Mixture of Experts

1 code implementation18 Oct 2024 Rachel S. Y. Teo, Tan M. Nguyen

Sparse Mixture of Experts (SMoE) has become the key to unlocking unparalleled scalability in deep learning.

Language Modeling Language Modelling +2

Monomial Matrix Group Equivariant Neural Functional Networks

1 code implementation18 Sep 2024 Hoang V. Tran, Thieu N. Vo, Tho H. Tran, An T. Nguyen, Tan M. Nguyen

We name our new family of NFNs the Monomial Matrix Group Equivariant Neural Functional Networks (Monomial-NFN).

Tree-Sliced Wasserstein Distance on a System of Lines

no code implementations19 Jun 2024 Viet-Hoang Tran, Trang Pham, Tho Tran, Tam Le, Tan M. Nguyen

While SW is prone to loss of topological information of input measures due to relying on one-dimensional projection, TSW is more flexible and has a higher degree of freedom by choosing a tree rather than a line to alleviate the curse of dimensionality in SW.

Computational Efficiency Style Transfer

A Primal-Dual Framework for Transformers and Neural Networks

no code implementations19 Jun 2024 Tan M. Nguyen, Tam Nguyen, Nhat Ho, Andrea L. Bertozzi, Richard G. Baraniuk, Stanley J. Osher

Self-attention is key to the remarkable success of transformers in sequence modeling tasks including many applications in natural language processing and computer vision.

Time Series Time Series Classification

Elliptical Attention

1 code implementation19 Jun 2024 Stefan K. Nielsen, Laziz U. Abdullaev, Rachel S. Y. Teo, Tan M. Nguyen

Pairwise dot-product self-attention is key to the success of transformers that achieve state-of-the-art performance across a variety of applications in language and vision.

Image Segmentation Language Modeling +2

Unveiling the Hidden Structure of Self-Attention via Kernel Principal Component Analysis

1 code implementation19 Jun 2024 Rachel S. Y. Teo, Tan M. Nguyen

In our work, we derive self-attention from kernel principal component analysis (kernel PCA) and show that self-attention projects its query vectors onto the principal component axes of its key matrix in a feature space.

Image Segmentation Language Modeling +2

PIDformer: Transformer Meets Control Theory

no code implementations25 Feb 2024 Tam Nguyen, César A. Uribe, Tan M. Nguyen, Richard G. Baraniuk

Motivated by this control framework, we derive a novel class of transformers, PID-controlled Transformer (PIDformer), aimed at improving robustness and mitigating the rank-collapse issue inherent in softmax transformers.

Image Segmentation Language Modeling +2

Mitigating Over-smoothing in Transformers via Regularized Nonlocal Functionals

no code implementations1 Dec 2023 Tam Nguyen, Tan M. Nguyen, Richard G. Baraniuk

Transformers have achieved remarkable success in a wide range of natural language processing and computer vision applications.

Image Segmentation Language Modeling +2

p-Laplacian Transformer

no code implementations6 Nov 2023 Tuan Nguyen, Tam Nguyen, Vinh Nguyen, Tan M. Nguyen

$p$-Laplacian regularization, rooted in graph and image signal processing, introduces a parameter $p$ to control the regularization effect on these data.

From Coupled Oscillators to Graph Neural Networks: Reducing Over-smoothing via a Kuramoto Model-based Approach

no code implementations6 Nov 2023 Tuan Nguyen, Hirotada Honda, Takashi Sano, Vinh Nguyen, Shugo Nakamura, Tan M. Nguyen

We propose the Kuramoto Graph Neural Network (KuramotoGNN), a novel class of continuous-depth graph neural networks (GNNs) that employs the Kuramoto model to mitigate the over-smoothing phenomenon, in which node features in GNNs become indistinguishable as the number of layers increases.

Graph Neural Network

ARIST: An Effective API Argument Recommendation Approach

no code implementations11 Jun 2023 Son Nguyen, Cuong Tran Manh, Kien T. Tran, Tan M. Nguyen, Thu-Trang Nguyen, Kien-Tuan Ngo, Hieu Dinh Vo

To implement this idea in the recommendation process, ARIST combines program analysis (PA), language models (LMs), and several features specialized for the recommendation task which consider the functionality of formal parameters and the positional information of code elements (e. g., variables or method calls) in the given context.

Improving Transformers with Probabilistic Attention Keys

1 code implementation16 Oct 2021 Tam Nguyen, Tan M. Nguyen, Dung D. Le, Duy Khuong Nguyen, Viet-Anh Tran, Richard G. Baraniuk, Nhat Ho, Stanley J. Osher

Inspired by this observation, we propose Transformer with a Mixture of Gaussian Keys (Transformer-MGK), a novel transformer architecture that replaces redundant heads in transformers with a mixture of keys at each head.

Language Modeling Language Modelling

Heavy Ball Neural Ordinary Differential Equations

1 code implementation NeurIPS 2021 Hedi Xia, Vai Suliafu, Hangjie Ji, Tan M. Nguyen, Andrea L. Bertozzi, Stanley J. Osher, Bao Wang

We propose heavy ball neural ordinary differential equations (HBNODEs), leveraging the continuous limit of the classical momentum accelerated gradient descent, to improve neural ODEs (NODEs) training and inference.

Image Classification

FMMformer: Efficient and Flexible Transformer via Decomposed Near-field and Far-field Attention

no code implementations NeurIPS 2021 Tan M. Nguyen, Vai Suliafu, Stanley J. Osher, Long Chen, Bao Wang

For instance, FMMformers achieve an average classification accuracy of $60. 74\%$ over the five Long Range Arena tasks, which is significantly better than the standard transformer's average accuracy of $58. 70\%$.

Language Modeling Language Modelling

MomentumRNN: Integrating Momentum into Recurrent Neural Networks

2 code implementations NeurIPS 2020 Tan M. Nguyen, Richard G. Baraniuk, Andrea L. Bertozzi, Stanley J. Osher, Bao Wang

Designing deep neural networks is an art that often involves an expensive search over candidate architectures.

Scheduled Restart Momentum for Accelerated Stochastic Gradient Descent

1 code implementation24 Feb 2020 Bao Wang, Tan M. Nguyen, Andrea L. Bertozzi, Richard G. Baraniuk, Stanley J. Osher

Nesterov accelerated gradient (NAG) improves the convergence rate of gradient descent (GD) for convex optimization using a specially designed momentum; however, it accumulates error when an inexact gradient is used (such as in SGD), slowing convergence at best and diverging at worst.

General Classification Image Classification

InfoCNF: An Efficient Conditional Continuous Normalizing Flow with Adaptive Solvers

no code implementations9 Dec 2019 Tan M. Nguyen, Animesh Garg, Richard G. Baraniuk, Anima Anandkumar

Continuous Normalizing Flows (CNFs) have emerged as promising deep generative models for a wide range of tasks thanks to their invertibility and exact likelihood estimation.

Conditional Image Generation Time Series +1

InfoCNF: Efficient Conditional Continuous Normalizing Flow Using Adaptive Solvers

no code implementations25 Sep 2019 Tan M. Nguyen, Animesh Garg, Richard G. Baraniuk, Anima Anandkumar

Continuous Normalizing Flows (CNFs) have emerged as promising deep generative models for a wide range of tasks thanks to their invertibility and exact likelihood estimation.

Conditional Image Generation Time Series +1

Cannot find the paper you are looking for? You can Submit a new open access paper.