Search Results for author: Lukas Lange

Found 25 papers, 14 papers with code

AnnoCTR: A Dataset for Detecting and Linking Entities, Tactics, and Techniques in Cyber Threat Reports

2 code implementations11 Apr 2024 Lukas Lange, Marc Müller, Ghazaleh Haratinezhad Torbati, Dragan Milchevski, Patrick Grau, Subhash Pujari, Annemarie Friedrich

In our few-shot scenario, we find that for identifying the MITRE ATT&CK concepts that are mentioned explicitly or implicitly in a text, concept descriptions from MITRE ATT&CK are an effective source for training data augmentation.

Data Augmentation

Discourse-Aware In-Context Learning for Temporal Expression Normalization

no code implementations11 Apr 2024 Akash Kumar Gautam, Lukas Lange, Jannik Strötgen

In this work, we explore the feasibility of proprietary and open-source large language models (LLMs) for TE normalization using in-context learning to inject task, document, and example information into the model.

In-Context Learning

GradSim: Gradient-Based Language Grouping for Effective Multilingual Training

no code implementations23 Oct 2023 Mingyang Wang, Heike Adel, Lukas Lange, Jannik Strötgen, Hinrich Schütze

However, not all languages positively influence each other and it is an open research question how to select the most suitable set of languages for multilingual training and avoid negative interference among languages whose characteristics or data distributions are not compatible.

Sentiment Analysis

TADA: Efficient Task-Agnostic Domain Adaptation for Transformers

1 code implementation22 May 2023 Chia-Chien Hung, Lukas Lange, Jannik Strötgen

Our broad evaluation in 4 downstream tasks for 14 domains across single- and multi-domain setups and high- and low-resource scenarios reveals that TADA is an effective and efficient alternative to full domain-adaptive pre-training and adapters for domain adaptation, while not introducing additional parameters or complex training steps.

Domain Adaptation

NLNDE at SemEval-2023 Task 12: Adaptive Pretraining and Source Language Selection for Low-Resource Multilingual Sentiment Analysis

no code implementations28 Apr 2023 Mingyang Wang, Heike Adel, Lukas Lange, Jannik Strötgen, Hinrich Schütze

In this work, we propose to leverage language-adaptive and task-adaptive pretraining on African texts and study transfer learning with source language selection on top of an African language-centric pretrained language model.

Language Modelling Sentiment Analysis +1

SwitchPrompt: Learning Domain-Specific Gated Soft Prompts for Classification in Low-Resource Domains

1 code implementation14 Feb 2023 Koustava Goswami, Lukas Lange, Jun Araki, Heike Adel

Prompting pre-trained language models leads to promising results across natural language processing tasks but is less effective when applied in low-resource domains, due to the domain gap between the pre-training data and the downstream task.

Language Modelling text-classification +1

Multilingual Normalization of Temporal Expressions with Masked Language Models

1 code implementation20 May 2022 Lukas Lange, Jannik Strötgen, Heike Adel, Dietrich Klakow

The detection and normalization of temporal expressions is an important task and preprocessing step for many applications.

Language Modelling Masked Language Modeling

CLIN-X: pre-trained language models and a study on cross-task transfer for concept extraction in the clinical domain

1 code implementation16 Dec 2021 Lukas Lange, Heike Adel, Jannik Strötgen, Dietrich Klakow

The field of natural language processing (NLP) has recently seen a large change towards using pre-trained language models for solving almost any task.

Clinical Concept Extraction Sentence +1

To Share or not to Share: Predicting Sets of Sources for Model Transfer Learning

1 code implementation EMNLP 2021 Lukas Lange, Jannik Strötgen, Heike Adel, Dietrich Klakow

For this, we study the effects of model transfer on sequence labeling across various domains and tasks and show that our methods based on model similarity and support vector machines are able to predict promising sources, resulting in performance increases of up to 24 F1 points.

text similarity Transfer Learning

ANEA: Distant Supervision for Low-Resource Named Entity Recognition

1 code implementation25 Feb 2021 Michael A. Hedderich, Lukas Lange, Dietrich Klakow

Distant supervision allows obtaining labeled training corpora for low-resource settings where only limited hand-annotated data exists.

Low Resource Named Entity Recognition named-entity-recognition +2

FAME: Feature-Based Adversarial Meta-Embeddings for Robust Input Representations

1 code implementation EMNLP 2021 Lukas Lange, Heike Adel, Jannik Strötgen, Dietrich Klakow

Combining several embeddings typically improves performance in downstream tasks as different embeddings encode different information.

NER POS +4

NLNDE at CANTEMIST: Neural Sequence Labeling and Parsing Approaches for Clinical Concept Extraction

no code implementations23 Oct 2020 Lukas Lange, Xiang Dai, Heike Adel, Jannik Strötgen

The recognition and normalization of clinical information, such as tumor morphology mentions, is an important, but complex process consisting of multiple subtasks.

Clinical Concept Extraction

NLNDE: The Neither-Language-Nor-Domain-Experts' Way of Spanish Medical Document De-Identification

no code implementations2 Jul 2020 Lukas Lange, Heike Adel, Jannik Strötgen

Natural language processing has huge potential in the medical domain which recently led to a lot of research in this field.

De-identification

Closing the Gap: Joint De-Identification and Concept Extraction in the Clinical Domain

1 code implementation ACL 2020 Lukas Lange, Heike Adel, Jannik Strötgen

Exploiting natural language processing in the clinical domain requires de-identification, i. e., anonymization of personal information in texts.

De-identification

On the Choice of Auxiliary Languages for Improved Sequence Tagging

no code implementations WS 2020 Lukas Lange, Heike Adel, Jannik Strötgen

Recent work showed that embeddings from related languages can improve the performance of sequence tagging, even for monolingual models.

Part-Of-Speech Tagging

Feature-Dependent Confusion Matrices for Low-Resource NER Labeling with Noisy Labels

1 code implementation IJCNLP 2019 Lukas Lange, Michael A. Hedderich, Dietrich Klakow

In low-resource settings, the performance of supervised labeling models can be improved with automatically annotated or distantly supervised data, which is cheap to create but often noisy.

Low Resource Named Entity Recognition named-entity-recognition +4

Cannot find the paper you are looking for? You can Submit a new open access paper.