Search Results for author: Shawn Tan

Found 16 papers, 9 papers with code

Unsupervised Dependency Graph Network

1 code implementation ACL 2022 Yikang Shen, Shawn Tan, Alessandro Sordoni, Peng Li, Jie zhou, Aaron Courville

We introduce a new model, the Unsupervised Dependency Graph Network (UDGN), that can induce dependency structures from raw corpora and the masked language modeling task.

Language Modelling Masked Language Modeling +3

Scattered Mixture-of-Experts Implementation

1 code implementation13 Mar 2024 Shawn Tan, Yikang Shen, Rameswar Panda, Aaron Courville

We present ScatterMoE, an implementation of Sparse Mixture-of-Experts (SMoE) on GPUs.

Sparse Universal Transformer

no code implementations11 Oct 2023 Shawn Tan, Yikang Shen, Zhenfang Chen, Aaron Courville, Chuang Gan

The Universal Transformer (UT) is a variant of the Transformer that shares parameters across its layers.

ModuleFormer: Modularity Emerges from Mixture-of-Experts

1 code implementation7 Jun 2023 Yikang Shen, Zheyu Zhang, Tianyou Cao, Shawn Tan, Zhenfang Chen, Chuang Gan

In our experiment, we found that the modular architecture enables three important abilities for large pre-trained language models: 1) Efficiency, since ModuleFormer only activates a subset of its modules for each input token, thus it could achieve the same performance as dense LLMs with more than two times throughput; 2) Extendability, ModuleFormer is more immune to catastrophic forgetting than dense LLMs and can be easily extended with new modules to learn new knowledge that is not included in the training data; 3) Specialisation, finetuning ModuleFormer could specialize a subset of modules to the finetuning task and the task-unrelated modules could be easily pruned for a lightweight deployment.

Language Modelling

Learning to Dequantise with Truncated Flows

no code implementations ICLR 2022 Shawn Tan, Chin-wei Huang, Alessandro Sordoni, Aaron Courville

Addtionally, since the support of the marginal $q(z)$ is bounded and the support of prior $p(z)$ is not, we propose renormalising the prior distribution over the support of $q(z)$.

Variational Inference

Ordered Memory

1 code implementation NeurIPS 2019 Yikang Shen, Shawn Tan, Arian Hosseini, Zhouhan Lin, Alessandro Sordoni, Aaron Courville

Inspired by Ordered Neurons (Shen et al., 2018), we introduce a new attention-based mechanism and use its cumulative probability to control the writing and erasing operation of the memory.

ListOps

Icentia11K: An Unsupervised Representation Learning Dataset for Arrhythmia Subtype Discovery

1 code implementation21 Oct 2019 Shawn Tan, Guillaume Androz, Ahmad Chamseddine, Pierre Fecteau, Aaron Courville, Yoshua Bengio, Joseph Paul Cohen

We release the largest public ECG dataset of continuous raw signals for representation learning containing 11 thousand patients and 2 billion labelled beats.

Clustering Representation Learning

{COMPANYNAME}11K: An Unsupervised Representation Learning Dataset for Arrhythmia Subtype Discovery

no code implementations25 Sep 2019 Shawn Tan, Guillaume Androz, Ahmad Chamseddine, Pierre Fecteau, Aaron Courville, Yoshua Bengio, Joseph Paul Cohen

We release the largest public ECG dataset of continuous raw signals for representation learning containing over 11k patients and 2 billion labelled beats.

Clustering Representation Learning

Investigating Biases in Textual Entailment Datasets

no code implementations23 Jun 2019 Shawn Tan, Yikang Shen, Chin-wei Huang, Aaron Courville

The ability to understand logical relationships between sentences is an important task in language understanding.

BIG-bench Machine Learning Natural Language Inference +2

Improving Explorability in Variational Inference with Annealed Variational Objectives

1 code implementation NeurIPS 2018 Chin-wei Huang, Shawn Tan, Alexandre Lacoste, Aaron Courville

Despite the advances in the representational capacity of approximate distributions for variational inference, the optimization process can still limit the density that is ultimately learned.

Variational Inference

Generating Contradictory, Neutral, and Entailing Sentences

no code implementations7 Mar 2018 Yikang Shen, Shawn Tan, Chin-wei Huang, Aaron Courville

Learning distributed sentence representations remains an interesting problem in the field of Natural Language Processing (NLP).

Natural Language Inference RTE +1

Self-organized Hierarchical Softmax

no code implementations26 Jul 2017 Yikang Shen, Shawn Tan, Chrisopher Pal, Aaron Courville

We propose a new self-organizing hierarchical softmax formulation for neural-network-based language models over large vocabularies.

Language Modelling Sentence +1

Cannot find the paper you are looking for? You can Submit a new open access paper.