1 code implementation • ACL 2022 • Yikang Shen, Shawn Tan, Alessandro Sordoni, Peng Li, Jie zhou, Aaron Courville
We introduce a new model, the Unsupervised Dependency Graph Network (UDGN), that can induce dependency structures from raw corpora and the masked language modeling task.
2 code implementations • 23 Oct 2024 • Shawn Tan, Yikang Shen, Songlin Yang, Aaron Courville, Rameswar Panda
We propose an alternative attention mechanism based on the stick-breaking process: For each token before the current, we determine a break point $\beta_{i, j}$, which represents the proportion of the remaining stick to allocate to the current token.
1 code implementation • 23 Aug 2024 • Yikang Shen, Matthew Stallone, Mayank Mishra, Gaoyuan Zhang, Shawn Tan, Aditya Prasad, Adriana Meza Soria, David D. Cox, Rameswar Panda
This is not only because there is a complicated correlation between learning rate, batch size, number of training tokens, model size, and other hyperparameters but also because it is prohibitively expensive to perform a hyperparameter search for large language models with Billions or Trillions of parameters.
3 code implementations • 13 Mar 2024 • Shawn Tan, Yikang Shen, Rameswar Panda, Aaron Courville
We present ScatterMoE, an implementation of Sparse Mixture-of-Experts (SMoE) on GPUs.
1 code implementation • 14 Dec 2023 • Kian Eng Ong, Sivaji Retta, Ramarajulu Srinivasan, Shawn Tan, Jun Liu
Cattle farming is one of the important and profitable agricultural industries.
2 code implementations • 11 Oct 2023 • Shawn Tan, Yikang Shen, Zhenfang Chen, Aaron Courville, Chuang Gan
The Universal Transformer (UT) is a variant of the Transformer that shares parameters across its layers.
1 code implementation • 7 Jun 2023 • Yikang Shen, Zheyu Zhang, Tianyou Cao, Shawn Tan, Zhenfang Chen, Chuang Gan
In our experiment, we found that the modular architecture enables three important abilities for large pre-trained language models: 1) Efficiency, since ModuleFormer only activates a subset of its modules for each input token, thus it could achieve the same performance as dense LLMs with more than two times throughput; 2) Extendability, ModuleFormer is more immune to catastrophic forgetting than dense LLMs and can be easily extended with new modules to learn new knowledge that is not included in the training data; 3) Specialisation, finetuning ModuleFormer could specialize a subset of modules to the finetuning task and the task-unrelated modules could be easily pruned for a lightweight deployment.
no code implementations • ICLR 2022 • Shawn Tan, Chin-wei Huang, Alessandro Sordoni, Aaron Courville
Addtionally, since the support of the marginal $q(z)$ is bounded and the support of prior $p(z)$ is not, we propose renormalising the prior distribution over the support of $q(z)$.
no code implementations • NAACL 2021 • Yikang Shen, Shawn Tan, Alessandro Sordoni, Siva Reddy, Aaron Courville
In the present work, we propose a new syntax-aware language model: Syntactic Ordered Memory (SOM).
1 code implementation • Findings of the Association for Computational Linguistics 2020 • Shawn Tan, Yikang Shen, Timothy J. O'Donnell, Alessandro Sordoni, Aaron Courville
We model the recursive production property of context-free grammars for natural and synthetic languages.
1 code implementation • NeurIPS 2019 • Yikang Shen, Shawn Tan, Arian Hosseini, Zhouhan Lin, Alessandro Sordoni, Aaron Courville
Inspired by Ordered Neurons (Shen et al., 2018), we introduce a new attention-based mechanism and use its cumulative probability to control the writing and erasing operation of the memory.
1 code implementation • 21 Oct 2019 • Shawn Tan, Guillaume Androz, Ahmad Chamseddine, Pierre Fecteau, Aaron Courville, Yoshua Bengio, Joseph Paul Cohen
We release the largest public ECG dataset of continuous raw signals for representation learning containing 11 thousand patients and 2 billion labelled beats.
no code implementations • 25 Sep 2019 • Shawn Tan, Guillaume Androz, Ahmad Chamseddine, Pierre Fecteau, Aaron Courville, Yoshua Bengio, Joseph Paul Cohen
We release the largest public ECG dataset of continuous raw signals for representation learning containing over 11k patients and 2 billion labelled beats.
no code implementations • 23 Jun 2019 • Shawn Tan, Yikang Shen, Chin-wei Huang, Aaron Courville
The ability to understand logical relationships between sentences is an important task in language understanding.
7 code implementations • ICLR 2019 • Yikang Shen, Shawn Tan, Alessandro Sordoni, Aaron Courville
When a larger constituent ends, all of the smaller constituents that are nested within it must also be closed.
1 code implementation • NeurIPS 2018 • Chin-wei Huang, Shawn Tan, Alexandre Lacoste, Aaron Courville
Despite the advances in the representational capacity of approximate distributions for variational inference, the optimization process can still limit the density that is ultimately learned.
no code implementations • 7 Mar 2018 • Yikang Shen, Shawn Tan, Chin-wei Huang, Aaron Courville
Learning distributed sentence representations remains an interesting problem in the field of Natural Language Processing (NLP).
no code implementations • 26 Jul 2017 • Yikang Shen, Shawn Tan, Chrisopher Pal, Aaron Courville
We propose a new self-organizing hierarchical softmax formulation for neural-network-based language models over large vocabularies.