Search Results for author: Shawn Tan

Found 16 papers, 9 papers with code

Unsupervised Dependency Graph Network

1 code implementation • ACL 2022 • Yikang Shen, Shawn Tan, Alessandro Sordoni, Peng Li, Jie zhou, Aaron Courville

We introduce a new model, the Unsupervised Dependency Graph Network (UDGN), that can induce dependency structures from raw corpora and the masked language modeling task.

Language Modelling Masked Language Modeling +3

Paper
Code

Scattered Mixture-of-Experts Implementation

1 code implementation • 13 Mar 2024 • Shawn Tan, Yikang Shen, Rameswar Panda, Aaron Courville

We present ScatterMoE, an implementation of Sparse Mixture-of-Experts (SMoE) on GPUs.

132

Paper
Code

CattleEyeView: A Multi-task Top-down View Cattle Dataset for Smarter Precision Livestock Farming

1 code implementation • 14 Dec 2023 • Kian Eng Ong, Sivaji Retta, Ramarajulu Srinivasan, Shawn Tan, Jun Liu

Cattle farming is one of the important and profitable agricultural industries.

Instance Segmentation Pose Estimation +1

Paper
Code

Sparse Universal Transformer

no code implementations • 11 Oct 2023 • Shawn Tan, Yikang Shen, Zhenfang Chen, Aaron Courville, Chuang Gan

The Universal Transformer (UT) is a variant of the Transformer that shares parameters across its layers.

Paper
Add Code

ModuleFormer: Modularity Emerges from Mixture-of-Experts

1 code implementation • 7 Jun 2023 • Yikang Shen, Zheyu Zhang, Tianyou Cao, Shawn Tan, Zhenfang Chen, Chuang Gan

In our experiment, we found that the modular architecture enables three important abilities for large pre-trained language models: 1) Efficiency, since ModuleFormer only activates a subset of its modules for each input token, thus it could achieve the same performance as dense LLMs with more than two times throughput; 2) Extendability, ModuleFormer is more immune to catastrophic forgetting than dense LLMs and can be easily extended with new modules to learn new knowledge that is not included in the training data; 3) Specialisation, finetuning ModuleFormer could specialize a subset of modules to the finetuning task and the task-unrelated modules could be easily pruned for a lightweight deployment.

Language Modelling

215

Paper
Code

Learning to Dequantise with Truncated Flows

no code implementations • ICLR 2022 • Shawn Tan, Chin-wei Huang, Alessandro Sordoni, Aaron Courville

Addtionally, since the support of the marginal $q(z)$ is bounded and the support of prior $p(z)$ is not, we propose renormalising the prior distribution over the support of $q(z)$.

Variational Inference

Paper
Add Code

Explicitly Modeling Syntax in Language Models with Incremental Parsing and a Dynamic Oracle

no code implementations • NAACL 2021 • Yikang Shen, Shawn Tan, Alessandro Sordoni, Siva Reddy, Aaron Courville

In the present work, we propose a new syntax-aware language model: Syntactic Ordered Memory (SOM).

Language Modelling

Paper
Add Code

Recursive Top-Down Production for Sentence Generation with Latent Trees

1 code implementation • Findings of the Association for Computational Linguistics 2020 • Shawn Tan, Yikang Shen, Timothy J. O'Donnell, Alessandro Sordoni, Aaron Courville

We model the recursive production property of context-free grammars for natural and synthetic languages.

Sentence Translation

Paper
Code

Ordered Memory

1 code implementation • NeurIPS 2019 • Yikang Shen, Shawn Tan, Arian Hosseini, Zhouhan Lin, Alessandro Sordoni, Aaron Courville

Inspired by Ordered Neurons (Shen et al., 2018), we introduce a new attention-based mechanism and use its cumulative probability to control the writing and erasing operation of the memory.

ListOps

Paper
Code

Icentia11K: An Unsupervised Representation Learning Dataset for Arrhythmia Subtype Discovery

1 code implementation • 21 Oct 2019 • Shawn Tan, Guillaume Androz, Ahmad Chamseddine, Pierre Fecteau, Aaron Courville, Yoshua Bengio, Joseph Paul Cohen

We release the largest public ECG dataset of continuous raw signals for representation learning containing 11 thousand patients and 2 billion labelled beats.

Clustering Representation Learning

Paper
Code

{COMPANYNAME}11K: An Unsupervised Representation Learning Dataset for Arrhythmia Subtype Discovery

no code implementations • 25 Sep 2019 • Shawn Tan, Guillaume Androz, Ahmad Chamseddine, Pierre Fecteau, Aaron Courville, Yoshua Bengio, Joseph Paul Cohen

We release the largest public ECG dataset of continuous raw signals for representation learning containing over 11k patients and 2 billion labelled beats.

Clustering Representation Learning

Paper
Add Code

Investigating Biases in Textual Entailment Datasets

no code implementations • 23 Jun 2019 • Shawn Tan, Yikang Shen, Chin-wei Huang, Aaron Courville

The ability to understand logical relationships between sentences is an important task in language understanding.

BIG-bench Machine Learning Natural Language Inference +2

Paper
Add Code

Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks

7 code implementations • ICLR 2019 • Yikang Shen, Shawn Tan, Alessandro Sordoni, Aaron Courville

When a larger constituent ends, all of the smaller constituents that are nested within it must also be closed.

Ranked #13 on Constituency Grammar Induction on PTB Diagnostic ECG Database

Constituency Grammar Induction Inductive Bias +1

577

Paper
Code

Improving Explorability in Variational Inference with Annealed Variational Objectives

1 code implementation • NeurIPS 2018 • Chin-wei Huang, Shawn Tan, Alexandre Lacoste, Aaron Courville

Despite the advances in the representational capacity of approximate distributions for variational inference, the optimization process can still limit the density that is ultimately learned.

Variational Inference

Paper
Code

Generating Contradictory, Neutral, and Entailing Sentences

no code implementations • 7 Mar 2018 • Yikang Shen, Shawn Tan, Chin-wei Huang, Aaron Courville

Learning distributed sentence representations remains an interesting problem in the field of Natural Language Processing (NLP).

Natural Language Inference RTE +1

Paper
Add Code

Self-organized Hierarchical Softmax

no code implementations • 26 Jul 2017 • Yikang Shen, Shawn Tan, Chrisopher Pal, Aaron Courville

We propose a new self-organizing hierarchical softmax formulation for neural-network-based language models over large vocabularies.

Language Modelling Sentence +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.