Here we study whether structural guidance leads to more human-like systematic linguistic generalization in Transformer language models without resorting to pre-training on very large amounts of data.
In this work, we propose a transition-based system that combines hard-attention over sentences with a target-side action pointer mechanism to decouple source tokens from node representations and address alignments.
Ranked #4 on AMR Parsing on LDC2020T02 (using extra training data)
We propose sparsemax, a new activation function similar to the traditional softmax, but able to output sparse probabilities.
We introduce a model for constructing vector representations of words by composing characters using bidirectional LSTMs.
Ranked #3 on Part-Of-Speech Tagging on Penn Treebank