no code implementations • NAACL (SIGTYP) 2022 • Andrea De Varda, Roberto Zamparelli
We showed that while both models were favoured by a Zipfian distribution of the tokens and by the presence of head-dependency type structures, the multilingual transformer network exhibited a stronger reliance on hierarchical cues compared to its monolingual counterpart.