Representation and Bias in Multilingual NLP: Insights from Controlled Experiments on Conditional Language Modeling

1 Jan 2021  ·  Ada Wan ·

Inspired by the phenomenon of performance disparity between languages in machine translation, we investigate whether and to what extent languages are equally hard to "conditional-language-model". Our goal is to improve our understanding and expectation of the relationship between language, data representation, size, and performance in one-to-one conditional language modeling through a series of systematically controlled experiments with the Transformer and parallel data on the 6 diverse, official languages of the United Nations --- in 30 directions, 5 sizes, and 3 primary representation types in character, byte, and word, along with 5 alternate variants for a secondary set of controls. We observe indications suggesting a script bias on the character level, a length bias on the byte level, and a word bias that gives rise to a hierarchy in performance across languages. We also identify two types of sample-wise non-monotonicity --- while word-based representations are prone to exhibit Double Descent, length can induce unstable performance across the size range studied in a novel meta phenomenon which we term "erraticity". By eliminating statistically significant performance disparity on the character and byte levels, we show that, in the context of computing with the Transformer, there is no complexity intrinsic to languages other than that related to their statistical attributes and that performance disparity is not a necessary condition but a byproduct of word segmentation. Our application of statistical comparisons as a fairness measure also serves as a novel rigorous method for the intrinsic evaluation of languages, resolving a decades-long debate on language complexity. We hope our work helps open up new directions in the area of language and computing that would be fairer and more flexible.

PDF Abstract

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods