Incorporating lexical knowledge into deep learning models has been proved to be very effective for sequence labeling tasks.
The multilingual pre-trained language models (e. g, mBERT, XLM and XLM-R) have shown impressive performance on cross-lingual natural language understanding tasks.
It is demonstrated, in the unified framework of control and detection, that all kernel attacks can be structurally detected when not only the observer-based residual, but also the control signal based residual signals are generated and used for the detection purpose.
Comprehensive experiments on the evaluation benchmarks demonstrate that 1) layer mapping strategy has a significant effect on task-agnostic BERT distillation and different layer mappings can result in quite different performances; 2) the optimal layer mapping strategy from the proposed search process consistently outperforms the other heuristic ones; 3) with the optimal layer mapping, our student model achieves state-of-the-art performance on the GLUE tasks.
To accelerate inference and reduce model size while maintaining accuracy, we first propose a novel Transformer distillation method that is specially designed for knowledge distillation (KD) of the Transformer-based models.
Ranked #1 on Natural Language Inference on MultiNLI Dev
Supervised approaches to named entity recognition (NER) are largely developed based on the assumption that the training data is fully annotated with named entity information.
Semantic role labeling (SRL) aims to recognize the predicate-argument structure of a sentence.
In the correction stage, candidates were generated by the three GEC models and then merged to output the final corrections for M and S types.
Hypernym discovery aims to discover the hypernym word sets given a hyponym word and proper corpus.
Ranked #3 on Hypernym Discovery on General
This paper describes our submissions for SemEval-2018 Task 8: Semantic Extraction from CybersecUrity REports using NLP.