We propose new methods for in-domain and cross-domain Named Entity Recognition (NER) on historical data for Dutch and French.
Domain adaptation of Pretrained Language Models (PTLMs) is typically achieved by unsupervised pretraining on target-domain text.
We present a novel way of injecting factual knowledge about entities into the pretrained BERT model (Devlin et al., 2019): We align Wikipedia2Vec entity vectors (Yamada et al., 2016) with BERT's native wordpiece vector space and use the aligned entity vectors as if they were wordpiece vectors.
We address the task of unsupervised Semantic Textual Similarity (STS) by ensembling diverse pre-trained sentence encoders into sentence meta-embeddings.
We address the problem of Duplicate Question Detection (DQD) in low-resource domain-specific Community Question Answering forums.
Count-based word alignment methods, such as the IBM models or fast-align, struggle on very small parallel corpora.
In this work, we introduce the task of Open-Type Relation Argument Extraction (ORAE): Given a corpus, a query entity Q and a knowledge base relation (e. g.,"Q authored notable work with title X"), the model has to extract an argument of non-standard entity type (entities that cannot be extracted by a standard named entity tagger, e. g. X: the title of a book or a work of art) from the corpus.