no code implementations • 24 Nov 2022 • Xiang Dai, Sarvnaz Karimi
Information Extraction from scientific literature can be challenging due to the highly specialised nature of such text.
no code implementations • 11 Oct 2022 • Ilias Chalkidis, Xiang Dai, Manos Fergadiotis, Prodromos Malakasiotis, Desmond Elliott
Non-hierarchical sparse attention Transformer-based models, such as Longformer and Big Bird, are popular approaches to working with long documents.
no code implementations • 25 Apr 2022 • Shiqi Xu, Xiang Dai, Xi Yang, Kevin C. Zhou, Kanghyun Kim, Vinayak Pathak, Carolyn Glass, Roarke Horstmeyer
We report Tensorial Tomographic Differential Phase-Contrast microscopy (T2DPC), a quantitative label-free tomographic imaging method for simultaneous measurement of phase and anisotropy.
1 code implementation • 14 Apr 2022 • Xiang Dai, Ilias Chalkidis, Sune Darkner, Desmond Elliott
The recent literature in text classification is biased towards short text sequences (e. g., sentences or paragraphs).
1 code implementation • Findings (EMNLP) 2021 • Rasmus Kær Jørgensen, Mareike Hartmann, Xiang Dai, Desmond Elliott
Domain adaptive pretraining, i. e. the continued unsupervised pretraining of a language model on domain-specific text, improves the modelling of text for downstream tasks within the domain.
no code implementations • 23 Jun 2021 • Xiang Dai
However, there are several open challenges of applying these models to recognise biomedical names: 1) Biomedical names may contain complex inner structure (discontinuity and overlapping) which cannot be recognised using standard sequence tagging technique; 2) The training of NER models usually requires large amount of labelled data, which are difficult to obtain in the biomedical domain; and, 3) Commonly used language representation models are pre-trained on generic data; a domain shift therefore exists between these models and target biomedical data.
no code implementations • 23 Oct 2020 • Lukas Lange, Xiang Dai, Heike Adel, Jannik Strötgen
The recognition and normalization of clinical information, such as tumor morphology mentions, is an important, but complex process consisting of multiple subtasks.
2 code implementations • COLING 2020 • Xiang Dai, Heike Adel
Simple yet effective data augmentation techniques have been proposed for sentence-level and sentence-pair natural language processing tasks.
no code implementations • Findings of the Association for Computational Linguistics 2020 • Xiang Dai, Sarvnaz Karimi, Ben Hachey, Cecile Paris
Recent studies on domain-specific BERT models show that effectiveness on downstream tasks can be improved when models are pretrained on in-domain data.
Ranked #3 on
Clinical Concept Extraction
on 2010 i2b2/VA
1 code implementation • ACL 2020 • Xiang Dai, Sarvnaz Karimi, Ben Hachey, Cecile Paris
Unlike widely used Named Entity Recognition (NER) data sets in generic domains, biomedical NER data sets often contain mentions consisting of discontinuous spans.
no code implementations • ACL 2019 • Nicky Ringland, Xiang Dai, Ben Hachey, Sarvnaz Karimi, Cecile Paris, James R. Curran
Named entity recognition (NER) is widely used in natural language processing applications and downstream tasks.
1 code implementation • NAACL 2019 • Xiang Dai, Sarvnaz Karimi, Ben Hachey, Cecile Paris
Word vectors and Language Models (LMs) pretrained on a large amount of unlabelled data can dramatically improve various Natural Language Processing (NLP) tasks.
Ranked #1 on
Named Entity Recognition (NER)
on WetLab
no code implementations • WS 2018 • Aditya Joshi, Xiang Dai, Sarvnaz Karimi, Ross Sparks, C{\'e}cile Paris, C Raina MacIntyre
Vaccination behaviour detection deals with predicting whether or not a person received/was about to receive a vaccine.
no code implementations • ACL 2018 • Xiang Dai
Standard named entity recognizers can effectively recognize entity mentions that consist of contiguous tokens and do not overlap with each other.
no code implementations • WS 2017 • Sarvnaz Karimi, Xiang Dai, Hamed Hassanzadeh, Anthony Nguyen
Diagnosis autocoding services and research intend to both improve the productivity of clinical coders and the accuracy of the coding.