1 code implementation • NAACL 2021 • Adithya V Ganesan, Matthew Matero, Aravind Reddy Ravula, Huy Vu, H. Andrew Schwartz
In human-level NLP tasks, such as predicting mental health, personality, or demographics, the number of observations is often smaller than the standard 768+ hidden state sizes of each layer within modern transformer-based language models, limiting the ability to effectively leverage transformers.