TEASEL: A Transformer-Based Speech-Prefixed Language Model

12 Sep 2021  ·  Mehdi Arjmand, Mohammad Javad Dousti, Hadi Moradi ·

Multimodal language analysis is a burgeoning field of NLP that aims to simultaneously model a speaker's words, acoustical annotations, and facial expressions. In this area, lexicon features usually outperform other modalities because they are pre-trained on large corpora via Transformer-based models. Despite their strong performance, training a new self-supervised learning (SSL) Transformer on any modality is not usually attainable due to insufficient data, which is the case in multimodal language learning. This work proposes a Transformer-Based Speech-Prefixed Language Model called TEASEL to approach the mentioned constraints without training a complete Transformer model. TEASEL model includes speech modality as a dynamic prefix besides the textual modality compared to a conventional language model. This method exploits a conventional pre-trained language model as a cross-modal Transformer model. We evaluated TEASEL for the multimodal sentiment analysis task defined by CMU-MOSI dataset. Extensive experiments show that our model outperforms unimodal baseline language models by 4% and outperforms the current multimodal state-of-the-art (SoTA) model by 1% in F1-score. Additionally, our proposed method is 72% smaller than the SoTA model.

PDF Abstract

Datasets


Results from the Paper


Ranked #3 on Multimodal Sentiment Analysis on CMU-MOSI (using extra training data)

     Get a GitHub badge
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Result Benchmark
Multimodal Sentiment Analysis CMU-MOSI TEASEL F1 85 # 3
MAE 0.64 # 1
Corr 0.836 # 1
Acc-7 47.52 # 3
Acc-2 87.5 # 1

Methods