no code implementations • EMNLP (sustainlp) 2021 • Vin Sachidananda, Jason S. Kessler, Yi-An Lai
While adaptive tokenization incurs a 6% increase in model parameters in our experimentation, due to the introduction of 10k new domain-specific tokens, our approach, using 64 vCPUs, is 72x faster than further pretraining the language model on domain-specific corpora on 8 TPUs.
1 code implementation • 2 Mar 2017 • Jason S. Kessler
Scattertext is an open source tool for visualizing linguistic variation between document categories in a language-independent way.