1 code implementation • ACL 2022 • Yikang Shen, Shawn Tan, Alessandro Sordoni, Peng Li, Jie zhou, Aaron Courville
We introduce a new model, the Unsupervised Dependency Graph Network (UDGN), that can induce dependency structures from raw corpora and the masked language modeling task.
1 code implementation • 7 Jun 2023 • Yikang Shen, Zheyu Zhang, Tianyou Cao, Shawn Tan, Zhenfang Chen, Chuang Gan
In our experiment, we found that the modular architecture enables three important abilities for large pre-trained language models: 1) Efficiency, since ModuleFormer only activates a subset of its modules for each input token, thus it could achieve the same performance as dense LLMs with more than two times throughput; 2) Extendability, ModuleFormer is more immune to catastrophic forgetting than dense LLMs and can be easily extended with new modules to learn new knowledge that is not included in the training data; 3) Specialisation, finetuning ModuleFormer could specialize a subset of modules to the finetuning task and the task-unrelated modules could be easily pruned for a lightweight deployment.
no code implementations • ICLR 2022 • Shawn Tan, Chin-wei Huang, Alessandro Sordoni, Aaron Courville
Addtionally, since the support of the marginal $q(z)$ is bounded and the support of prior $p(z)$ is not, we propose renormalising the prior distribution over the support of $q(z)$.
no code implementations • NAACL 2021 • Yikang Shen, Shawn Tan, Alessandro Sordoni, Siva Reddy, Aaron Courville
In the present work, we propose a new syntax-aware language model: Syntactic Ordered Memory (SOM).
1 code implementation • Findings of the Association for Computational Linguistics 2020 • Shawn Tan, Yikang Shen, Timothy J. O'Donnell, Alessandro Sordoni, Aaron Courville
We model the recursive production property of context-free grammars for natural and synthetic languages.
1 code implementation • NeurIPS 2019 • Yikang Shen, Shawn Tan, Arian Hosseini, Zhouhan Lin, Alessandro Sordoni, Aaron Courville
Inspired by Ordered Neurons (Shen et al., 2018), we introduce a new attention-based mechanism and use its cumulative probability to control the writing and erasing operation of the memory.
1 code implementation • 21 Oct 2019 • Shawn Tan, Guillaume Androz, Ahmad Chamseddine, Pierre Fecteau, Aaron Courville, Yoshua Bengio, Joseph Paul Cohen
We release the largest public ECG dataset of continuous raw signals for representation learning containing 11 thousand patients and 2 billion labelled beats.
no code implementations • 25 Sep 2019 • Shawn Tan, Guillaume Androz, Ahmad Chamseddine, Pierre Fecteau, Aaron Courville, Yoshua Bengio, Joseph Paul Cohen
We release the largest public ECG dataset of continuous raw signals for representation learning containing over 11k patients and 2 billion labelled beats.
no code implementations • 23 Jun 2019 • Shawn Tan, Yikang Shen, Chin-wei Huang, Aaron Courville
The ability to understand logical relationships between sentences is an important task in language understanding.
7 code implementations • ICLR 2019 • Yikang Shen, Shawn Tan, Alessandro Sordoni, Aaron Courville
When a larger constituent ends, all of the smaller constituents that are nested within it must also be closed.
1 code implementation • NeurIPS 2018 • Chin-wei Huang, Shawn Tan, Alexandre Lacoste, Aaron Courville
Despite the advances in the representational capacity of approximate distributions for variational inference, the optimization process can still limit the density that is ultimately learned.
no code implementations • 7 Mar 2018 • Yikang Shen, Shawn Tan, Chin-wei Huang, Aaron Courville
Learning distributed sentence representations remains an interesting problem in the field of Natural Language Processing (NLP).
no code implementations • 26 Jul 2017 • Yikang Shen, Shawn Tan, Chrisopher Pal, Aaron Courville
We propose a new self-organizing hierarchical softmax formulation for neural-network-based language models over large vocabularies.