Weight Tying

Introduced by Press et al. in Using the Output Embedding to Improve Language Models

Weight Tying improves the performance of language models by tying (sharing) the weights of the embedding and softmax layers. This method also massively reduces the total number of parameters in the language models that it is applied to.

Language models are typically comprised of an embedding layer, followed by a number of Transformer or LSTM layers, which are finally followed by a softmax layer. Embedding layers learn word representations, such that similar words (in meaning) are represented by vectors that are near each other (in cosine distance). [Press & Wolf, 2016] showed that the softmax matrix, in which every word also has a vector representation, also exhibits this property. This leads them to propose to share the softmax and embedding matrices, which is done today in nearly all language models.

This method was independently introduced by Press & Wolf, 2016 and Inan et al, 2016.

Additionally, the Press & Wolf paper proposes Three-way Weight Tying, a method for NMT models in which the embedding matrix for the source language, the embedding matrix for the target language, and the softmax matrix for the target language are all tied. That method has been adopted by the Attention Is All You Need model and many other neural machine translation models.

Source: Using the Output Embedding to Improve Language Models

Latest Papers

PAPER DATE
Palomino-Ochoa at SemEval-2020 Task 9: Robust System based on Transformer for Code-Mixed Sentiment Classification
Daniel PalominoJose Ochoa-Luna
2020-11-18
Pagsusuri ng RNN-based Transfer Learning Technique sa Low-Resource Language
| Dan John Velasco
2020-10-13
[email protected]: Pre-training ULMFiT on Synthetically Generated Code-Mixed Data for Hate Speech Detection
Gaurav Arora
2020-10-05
Fine-tuning Pre-trained Contextual Embeddings for Citation Content Analysis in Scholarly Publication
Haihua ChenHuyen Nguyen
2020-09-12
HinglishNLP: Fine-tuned Language Models for Hinglish Sentiment Detection
Meghana BhangeNirant Kasliwal
2020-08-22
Composer Style Classification of Piano Sheet Music Images Using Language Model Pretraining
TJ TsaiKevin Ji
2020-07-29
Probing for Referential Information in Language Models
Ionut-Teodor SorodocKristina GulordavaGemma Boleda
2020-07-01
Text Categorization for Conflict Event Annotation
Fredrik OlssonMagnus SahlgrenFehmi ben AbdesslemAriel EkgrenKristine Eck
2020-05-01
Offensive language detection in Arabic using ULMFiT
Mohamed AbdellatifAhmed Elgammal
2020-05-01
Evaluation Metrics for Headline Generation Using Deep Pre-Trained Embeddings
Abdul MoeedYang AnGerhard HagererGeorg Groh
2020-05-01
Inferring the source of official texts: can SVM beat ULMFiT?
| Pedro Henrique Luz de AraujoTeófilo Emidio de CamposMarcelo Magalhães Silva de Sousa
2020-03-02
MaxUp: A Simple Way to Improve Generalization of Neural Network Training
Chengyue GongTongzheng RenMao YeQiang Liu
2020-02-20
Localized Flood DetectionWith Minimal Labeled Social Media Data Using Transfer Learning
Neha SinghNirmalya RoyAryya Gangopadhyay
2020-02-10
Natural language processing of MIMIC-III clinical notes for identifying diagnosis and procedures with neural networks
Siddhartha NuthakkiSunil NeelaJudy W. GichoyaSaptarshi Purkayastha
2019-12-28
A Comparative Study of Pretrained Language Models on Thai Social Text Categorization
Thanapapas HorsuwanKasidis KanwatcharaPeerapon VateekulBoonserm Kijsirikul
2019-12-03
DeFINE: DEep Factorized INput Token Embeddings for Neural Sequence Modeling
Sachin MehtaRik Koncel-KedziorskiMohammad RastegariHannaneh Hajishirzi
2019-11-27
A Subword Level Language Model for Bangla Language
Aisha KhatunAnisur RahmanHemayet Ahmed ChowdhuryMd. Saiful IslamAyesha Tasnim
2019-11-15
Evolution of transfer learning in natural language processing
Aditya MaltePratik Ratadiya
2019-10-16
The merits of Universal Language Model Fine-tuning for Small Datasets -- a case with Dutch book reviews
Benjamin van der BurghSuzan Verberne
2019-10-02
Analyzing Customer Feedback for Product Fit Prediction
Stephan Baier
2019-08-28
Efficient Language Modeling with Automatic Relevance Determination in Recurrent Neural Networks
Maxim KodryanArtem GrachevDmitry IgnatovDmitry Vetrov
2019-08-01
Representation Degeneration Problem in Training Natural Language Generation Models
Jun GaoDi HeXu TanTao QinLiwei WangTie-Yan Liu
2019-07-28
Low-Shot Classification: A Comparison of Classical and Deep Transfer Machine Learning Approaches
Peter UsherwoodSteven Smit
2019-07-17
Evaluating Language Model Finetuning Techniques for Low-resource Languages
| Jan Christian Blaise CruzCharibeth Cheng
2019-06-30
Exploiting Unsupervised Pre-training and Automated Feature Engineering for Low-resource Hate Speech Detection in Polish
Renard KorzeniowskiRafał RolczyńskiPrzemysław SadownikTomasz KorbakMarcin Możejko
2019-06-17
Speak up, Fight Back! Detection of Social Media Disclosures of Sexual Harassment
| Arijit Ghosh ChowdhuryRamit SawhneyPuneet MathurDebanjan MahataRajiv Ratn Shah
2019-06-01
Figure Eight at SemEval-2019 Task 3: Ensemble of Transfer Learning Methods for Contextual Emotion Detection
Joan Xiao
2019-06-01
An Empirical Evaluation of Text Representation Schemes on Multilingual Social Web to Filter the Textual Aggression
Sandip ModhaPrasenjit Majumder
2019-04-16
Low Resource Text Classification with ULMFit and Backtranslation
Sam Shleifer
2019-03-21
Trellis Networks for Sequence Modeling
| Shaojie BaiJ. Zico KolterVladlen Koltun
2018-10-15
Language Informed Modeling of Code-Switched Text
ChKhyathi uThomas ManziniSumeet SinghAlan W. Black
2018-07-01
Universal Language Model Fine-tuning for Text Classification
| Jeremy HowardSebastian Ruder
2018-01-18
Breaking the Softmax Bottleneck: A High-Rank RNN Language Model
| Zhilin YangZihang DaiRuslan SalakhutdinovWilliam W. Cohen
2017-11-10
Regularizing and Optimizing LSTM Language Models
| Stephen MerityNitish Shirish KeskarRichard Socher
2017-08-07
The University of Edinburgh's Neural MT Systems for WMT17
Rico SennrichAlexandra BirchAnna CurreyUlrich GermannBarry HaddowKenneth HeafieldAntonio Valerio Miceli BaronePhilip Williams
2017-08-02
Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling
| Hakan InanKhashayar KhosraviRichard Socher
2016-11-04
Using the Output Embedding to Improve Language Models
| Ofir PressLior Wolf
2016-08-20

Components

COMPONENT TYPE
🤖 No Components Found You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories