The emergence of multilingual pre-trained language models makes it possible to adapt to target languages with only few labeled examples. However, vanilla fine-tuning tends to achieve degenerated and unstable results, owing to the Language Interference among different languages, and Parameter Overload under the few-sample transfer learning scenarios. To address two problems elegantly, we propose S^4-Tuning, a Simple Cross-lingual Sub-network Tuning method.
In this paper, we focus on extracting event arguments from an entire document, which mainly faces two critical problems: a) the long-distance dependency between trigger and arguments over sentences; b) the distracting context towards an event in the document.
As Abstract Meaning Representation (AMR) implicitly involves compound semantic annotations, we hypothesize auxiliary tasks which are semantically or formally related can better enhance AMR parsing.
Ranked #6 on AMR Parsing on LDC2017T10 (using extra training data)
Pre-trained language models (PLMs) like BERT have made significant progress in various downstream NLP tasks.
Structured pruning has been extensively studied on monolingual pre-trained language models and is yet to be fully evaluated on their multilingual counterparts.
Pre-trained Language Models (PLMs) have achieved remarkable performance for various language understanding tasks in IR systems, which require the fine-tuning process based on labeled training data.
Label smoothing and vocabulary sharing are two widely used techniques in neural machine translation models.
Unified in contrastive learning, CAP enables the pruned model to learn from the pre-trained model for task-agnostic knowledge, and fine-tuned model for task-specific knowledge.
Few-Shot Sequence Labeling (FSSL) is a canonical paradigm for the tagging models, e. g., named entity recognition and slot filling, to generalize on an emerging, resource-scarce domain.
Ranked #2 on Few-shot NER on Few-NERD (INTER)
Recent pretrained language models extend from millions to billions of parameters.
However, we find they suffer from trigger biases that signify the statistical homogeneity between some trigger words and target event types, which we summarize as trigger overlapping and trigger separability.
Table encoder extracts sentiment at token-pair level, so that the compositional feature between targets and opinions can be easily captured.
Existing methods are not effective due to two challenges of this task: a) the target event arguments are scattered across sentences; b) the correlation among events in a document is non-trivial to model.
Ranked #2 on Document-level Event Extraction on ChFinAnn
In this paper, we describe our submissions to the WMT20 shared task on parallel corpus filtering and alignment for low-resource conditions.
Document-level relation extraction aims to extract relations among entities within a document.
Ranked #10 on Relation Extraction on DocRED
This paper proposes the building of Xiaomingbot, an intelligent, multilingual and multimodal software robot equipped with four integral capabilities: news generation, news translation, news reading and avatar animation.
Adaptive gradient methods have attracted much attention of machine learning communities due to the high efficiency.
Due to its simplicity and outstanding ability to generalize, stochastic gradient descent (SGD) is still the most widely used optimization method despite its slow convergence.