1 code implementation • 14 Mar 2025 • Rachel S. Y. Teo, Tan M. Nguyen
We then propose the Mixture of Layer Experts (MoLEx), a novel sparse mixture of experts (SMoE) whose experts are layers in the pre-trained model.
1 code implementation • 14 Mar 2025 • Hoang V. Tran, Khoi N. M. Nguyen, Trang Pham, Thanh T. Chu, Tam Le, Tan M. Nguyen
However, projecting measures onto low-dimensional spaces can lead to a loss of topological information.
1 code implementation • 14 Mar 2025 • Hoang V. Tran, Thanh T. Chu, Khoi N. M. Nguyen, Trang Pham, Tam Le, Tan M. Nguyen
Inspired by this approach, in this paper, we present an adaptation of tree systems on OT problems for measures supported on a sphere.
1 code implementation • 26 Feb 2025 • Dung V. Nguyen, Minh H. Nguyen, Luc Q. Nguyen, Rachel S. Y. Teo, Tan M. Nguyen, Linh Duy Tran
In this paper, we introduce CAMEx (\textbf{C}urvature-\textbf{A}ware \textbf{M}erging of \textbf{Ex}perts), a novel expert merging protocol that incorporates natural gradients to account for the non-Euclidean curvature of the parameter manifold.
1 code implementation • 21 Feb 2025 • Stefan K. Nielsen, Rachel S. Y. Teo, Laziz U. Abdullaev, Tan M. Nguyen
Our AC router enables the MoE model to obtain three connected benefits: 1) faster convergence, 2) better robustness to data corruption, and 3) overall performance improvement, as experts are specialized in semantically distinct regions of the input space.
no code implementations • 22 Nov 2024 • Stefan K. Nielsen, Tan M. Nguyen
Contrastive learning has proven instrumental in learning unbiased representations of data, especially in complex environments characterized by high-cardinality and high-dimensional sensitive information.
1 code implementation • 18 Oct 2024 • Rachel S. Y. Teo, Tan M. Nguyen
Sparse Mixture of Experts (SMoE) has become the key to unlocking unparalleled scalability in deep learning.
1 code implementation • 18 Sep 2024 • Hoang V. Tran, Thieu N. Vo, Tho H. Tran, An T. Nguyen, Tan M. Nguyen
We name our new family of NFNs the Monomial Matrix Group Equivariant Neural Functional Networks (Monomial-NFN).
no code implementations • 19 Jun 2024 • Viet-Hoang Tran, Trang Pham, Tho Tran, Tam Le, Tan M. Nguyen
While SW is prone to loss of topological information of input measures due to relying on one-dimensional projection, TSW is more flexible and has a higher degree of freedom by choosing a tree rather than a line to alleviate the curse of dimensionality in SW.
no code implementations • 19 Jun 2024 • Tan M. Nguyen, Tam Nguyen, Nhat Ho, Andrea L. Bertozzi, Richard G. Baraniuk, Stanley J. Osher
Self-attention is key to the remarkable success of transformers in sequence modeling tasks including many applications in natural language processing and computer vision.
1 code implementation • 19 Jun 2024 • Stefan K. Nielsen, Laziz U. Abdullaev, Rachel S. Y. Teo, Tan M. Nguyen
Pairwise dot-product self-attention is key to the success of transformers that achieve state-of-the-art performance across a variety of applications in language and vision.
1 code implementation • 19 Jun 2024 • Rachel S. Y. Teo, Tan M. Nguyen
In our work, we derive self-attention from kernel principal component analysis (kernel PCA) and show that self-attention projects its query vectors onto the principal component axes of its key matrix in a feature space.
no code implementations • 25 Feb 2024 • Tam Nguyen, César A. Uribe, Tan M. Nguyen, Richard G. Baraniuk
Motivated by this control framework, we derive a novel class of transformers, PID-controlled Transformer (PIDformer), aimed at improving robustness and mitigating the rank-collapse issue inherent in softmax transformers.
no code implementations • 1 Dec 2023 • Tam Nguyen, Tan M. Nguyen, Richard G. Baraniuk
Transformers have achieved remarkable success in a wide range of natural language processing and computer vision applications.
no code implementations • 6 Nov 2023 • Tuan Nguyen, Tam Nguyen, Vinh Nguyen, Tan M. Nguyen
$p$-Laplacian regularization, rooted in graph and image signal processing, introduces a parameter $p$ to control the regularization effect on these data.
no code implementations • 6 Nov 2023 • Tuan Nguyen, Hirotada Honda, Takashi Sano, Vinh Nguyen, Shugo Nakamura, Tan M. Nguyen
We propose the Kuramoto Graph Neural Network (KuramotoGNN), a novel class of continuous-depth graph neural networks (GNNs) that employs the Kuramoto model to mitigate the over-smoothing phenomenon, in which node features in GNNs become indistinguishable as the number of layers increases.
no code implementations • 11 Jun 2023 • Son Nguyen, Cuong Tran Manh, Kien T. Tran, Tan M. Nguyen, Thu-Trang Nguyen, Kien-Tuan Ngo, Hieu Dinh Vo
To implement this idea in the recommendation process, ARIST combines program analysis (PA), language models (LMs), and several features specialized for the recommendation task which consider the functionality of formal parameters and the positional information of code elements (e. g., variables or method calls) in the given context.
1 code implementation • 16 Oct 2021 • Tam Nguyen, Tan M. Nguyen, Dung D. Le, Duy Khuong Nguyen, Viet-Anh Tran, Richard G. Baraniuk, Nhat Ho, Stanley J. Osher
Inspired by this observation, we propose Transformer with a Mixture of Gaussian Keys (Transformer-MGK), a novel transformer architecture that replaces redundant heads in transformers with a mixture of keys at each head.
1 code implementation • NeurIPS 2021 • Hedi Xia, Vai Suliafu, Hangjie Ji, Tan M. Nguyen, Andrea L. Bertozzi, Stanley J. Osher, Bao Wang
We propose heavy ball neural ordinary differential equations (HBNODEs), leveraging the continuous limit of the classical momentum accelerated gradient descent, to improve neural ODEs (NODEs) training and inference.
no code implementations • NeurIPS 2021 • Tan M. Nguyen, Vai Suliafu, Stanley J. Osher, Long Chen, Bao Wang
For instance, FMMformers achieve an average classification accuracy of $60. 74\%$ over the five Long Range Arena tasks, which is significantly better than the standard transformer's average accuracy of $58. 70\%$.
2 code implementations • NeurIPS 2020 • Tan M. Nguyen, Richard G. Baraniuk, Andrea L. Bertozzi, Stanley J. Osher, Bao Wang
Designing deep neural networks is an art that often involves an expensive search over candidate architectures.
1 code implementation • 24 Feb 2020 • Bao Wang, Tan M. Nguyen, Andrea L. Bertozzi, Richard G. Baraniuk, Stanley J. Osher
Nesterov accelerated gradient (NAG) improves the convergence rate of gradient descent (GD) for convex optimization using a specially designed momentum; however, it accumulates error when an inexact gradient is used (such as in SGD), slowing convergence at best and diverging at worst.
no code implementations • 9 Dec 2019 • Tan M. Nguyen, Animesh Garg, Richard G. Baraniuk, Anima Anandkumar
Continuous Normalizing Flows (CNFs) have emerged as promising deep generative models for a wide range of tasks thanks to their invertibility and exact likelihood estimation.
no code implementations • 25 Sep 2019 • Tan M. Nguyen, Animesh Garg, Richard G. Baraniuk, Anima Anandkumar
Continuous Normalizing Flows (CNFs) have emerged as promising deep generative models for a wide range of tasks thanks to their invertibility and exact likelihood estimation.