no code implementations • 2 Apr 2024 • Xingwu Chen, Difan Zou
Specifically, we designed a novel set of sequence learning tasks to systematically evaluate and comprehend how the depth of transformer affects its ability to perform memorization, reasoning, generalization, and contextual generalization.