2 code implementations • 11 Apr 2024 • Lukas Lange, Marc Müller, Ghazaleh Haratinezhad Torbati, Dragan Milchevski, Patrick Grau, Subhash Pujari, Annemarie Friedrich
In our few-shot scenario, we find that for identifying the MITRE ATT&CK concepts that are mentioned explicitly or implicitly in a text, concept descriptions from MITRE ATT&CK are an effective source for training data augmentation.
no code implementations • 11 Apr 2024 • Akash Kumar Gautam, Lukas Lange, Jannik Strötgen
In this work, we explore the feasibility of proprietary and open-source large language models (LLMs) for TE normalization using in-context learning to inject task, document, and example information into the model.
no code implementations • 31 Mar 2024 • Mingyang Wang, Heike Adel, Lukas Lange, Jannik Strötgen, Hinrich Schütze
Continual learning aims at incrementally acquiring new knowledge while not forgetting existing knowledge.
1 code implementation • 11 Dec 2023 • Timo Pierre Schrader, Simon Razniewski, Lukas Lange, Annemarie Friedrich
Understanding causality is a core aspect of intelligence.
no code implementations • 8 Dec 2023 • Mobashir Sadat, Zhengyu Zhou, Lukas Lange, Jun Araki, Arsalan Gundroo, Bingqing Wang, Rakesh R Menon, Md Rizwan Parvez, Zhe Feng
Hallucination is a well-known phenomenon in text generated by large language models (LLMs).
no code implementations • 23 Oct 2023 • Mingyang Wang, Heike Adel, Lukas Lange, Jannik Strötgen, Hinrich Schütze
However, not all languages positively influence each other and it is an open research question how to select the most suitable set of languages for multilingual training and avoid negative interference among languages whose characteristics or data distributions are not compatible.
1 code implementation • 22 May 2023 • Chia-Chien Hung, Lukas Lange, Jannik Strötgen
Our broad evaluation in 4 downstream tasks for 14 domains across single- and multi-domain setups and high- and low-resource scenarios reveals that TADA is an effective and efficient alternative to full domain-adaptive pre-training and adapters for domain adaptation, while not introducing additional parameters or complex training steps.
no code implementations • 28 Apr 2023 • Mingyang Wang, Heike Adel, Lukas Lange, Jannik Strötgen, Hinrich Schütze
In this work, we propose to leverage language-adaptive and task-adaptive pretraining on African texts and study transfer learning with source language selection on top of an African language-centric pretrained language model.
1 code implementation • 14 Feb 2023 • Koustava Goswami, Lukas Lange, Jun Araki, Heike Adel
Prompting pre-trained language models leads to promising results across natural language processing tasks but is less effective when applied in low-resource domains, due to the domain gap between the pre-training data and the downstream task.
1 code implementation • 20 May 2022 • Lukas Lange, Jannik Strötgen, Heike Adel, Dietrich Klakow
The detection and normalization of temporal expressions is an important task and preprocessing step for many applications.
1 code implementation • 16 Dec 2021 • Lukas Lange, Heike Adel, Jannik Strötgen, Dietrich Klakow
The field of natural language processing (NLP) has recently seen a large change towards using pre-trained language models for solving almost any task.
no code implementations • 17 Sep 2021 • Lukas Lange, Heike Adel, Jannik Strötgen
In this paper, we explore possible improvements of transformer models in a low-resource setting.
1 code implementation • EMNLP 2021 • Lukas Lange, Jannik Strötgen, Heike Adel, Dietrich Klakow
For this, we study the effects of model transfer on sequence labeling across various domains and tasks and show that our methods based on model similarity and support vector machines are able to predict promising sources, resulting in performance increases of up to 24 F1 points.
1 code implementation • 25 Feb 2021 • Michael A. Hedderich, Lukas Lange, Dietrich Klakow
Distant supervision allows obtaining labeled training corpora for low-resource settings where only limited hand-annotated data exists.
Low Resource Named Entity Recognition named-entity-recognition +2
1 code implementation • EMNLP 2021 • Lukas Lange, Heike Adel, Jannik Strötgen, Dietrich Klakow
Combining several embeddings typically improves performance in downstream tasks as different embeddings encode different information.
no code implementations • 23 Oct 2020 • Lukas Lange, Xiang Dai, Heike Adel, Jannik Strötgen
The recognition and normalization of clinical information, such as tumor morphology mentions, is an important, but complex process consisting of multiple subtasks.
1 code implementation • NAACL 2021 • Michael A. Hedderich, Lukas Lange, Heike Adel, Jannik Strötgen, Dietrich Klakow
Deep neural networks and huge language models are becoming omnipresent in natural language applications.
no code implementations • 2 Jul 2020 • Lukas Lange, Heike Adel, Jannik Strötgen
Natural language processing has huge potential in the medical domain which recently led to a lot of research in this field.
no code implementations • WS 2019 • Lukas Lange, Heike Adel, Jannik Strötgen
Named entity recognition has been extensively studied on English news texts.
1 code implementation • ACL 2020 • Annemarie Friedrich, Heike Adel, Federico Tomazic, Johannes Hingerl, Renou Benteau, Anika Maruscyk, Lukas Lange
With this paper, we publish our annotation guidelines, as well as our SOFC-Exp corpus consisting of 45 open-access scholarly articles annotated by domain experts.
no code implementations • WS 2020 • Lukas Lange, Anastasiia Iurshina, Heike Adel, Jannik Strötgen
Although temporal tagging is still dominated by rule-based systems, there have been recent attempts at neural temporal taggers.
Ranked #1 on Temporal Tagging on Catalan TimeBank 1.0
1 code implementation • ACL 2020 • Lukas Lange, Heike Adel, Jannik Strötgen
Exploiting natural language processing in the clinical domain requires de-identification, i. e., anonymization of personal information in texts.
no code implementations • WS 2020 • Lukas Lange, Heike Adel, Jannik Strötgen
Recent work showed that embeddings from related languages can improve the performance of sequence tagging, even for monolingual models.
1 code implementation • IJCNLP 2019 • Lukas Lange, Michael A. Hedderich, Dietrich Klakow
In low-resource settings, the performance of supervised labeling models can be improved with automatically annotated or distantly supervised data, which is cheap to create but often noisy.
Low Resource Named Entity Recognition named-entity-recognition +4