How Well Do Embedding Models Capture Non-compositionality? A View from Multiword Expressions
In this paper, we apply various embedding methods on multiword expressions to study how well they capture the nuances of non-compositional data. Our results from a pool of word-, character-, and document-level embbedings suggest that Word2vec performs the best, followed by FastText and Infersent. Moreover, we find that recently-proposed contextualised embedding models such as Bert and ELMo are not adept at handling non-compositionality in multiword expressions.
PDF AbstractTasks
Datasets
Add Datasets
introduced or used in this paper
Results from the Paper
Submit
results from this paper
to get state-of-the-art GitHub badges and help the
community compare results to other papers.
Methods
Adam •
Attention Dropout •
BERT •
BiLSTM •
Dense Connections •
Dropout •
ELMo •
fastText •
GELU •
Layer Normalization •
Linear Layer •
Linear Warmup With Linear Decay •
LSTM •
Multi-Head Attention •
Residual Connection •
Scaled Dot-Product Attention •
Sigmoid Activation •
Softmax •
Tanh Activation •
Weight Decay •
WordPiece