Dialect Identification through Adversarial Learning and Knowledge Distillation on Romanian BERT

EACL (VarDial) 2021 · George-Eduard Zaharia, Andrei-Marius Avram, Dumitru-Clementin Cercel, Traian Rebedea ·

Dialect identification is a task with applicability in a vast array of domains, ranging from automatic speech recognition to opinion mining. This work presents our architectures used for the VarDial 2021 Romanian Dialect Identification subtask. We introduced a series of solutions based on Romanian or multilingual Transformers, as well as adversarial training techniques. At the same time, we experimented with a knowledge distillation tool in order to check whether a smaller model can maintain the performance of our best approach. Our best solution managed to obtain a weighted F1-score of 0.7324, allowing us to obtain the 2nd place on the leaderboard.

PDF Abstract