The increasing deployment of artificial intelligence (AI) tools to inform decision making across diverse areas including healthcare, employment, social benefits, and government policy, presents a serious risk for disabled people, who have been shown to face bias in AI implementations.
We develop a deep convolutional network that utilizes textual entity representations and demonstrate that our model outperforms recent KG completion methods in this challenging setting.
Introducing biomedical informatics (BMI) students to natural language processing (NLP) requires balancing technical depth with practical know-how to address application-focused needs.
Natural language processing (NLP) research combines the study of universal principles, through basic science, with applied science targeting specific use cases and settings.
Embeddings of words and concepts capture syntactic and semantic regularities of language; however, they have seen limited use as tools to study characteristics of different corpora and how they relate to one another.
Both classification and candidate selection approaches present distinct strengths for automated coding in under-studied domains, and we highlight that the combination of (i) a small annotated data set; (ii) expert definitions of codes of interest; and (iii) a representative text corpus is sufficient to produce high-performing automated coding systems.
To address the dearth of annotated training data for medical entity linking, we present WikiMed and PubMedDS, two large-scale medical entity linking datasets, and demonstrate that pre-training MedType on these datasets further improves entity linking performance.
The disability benefits programs administered by the US Social Security Administration (SSA) receive between 2 and 3 million new applications each year.
Natural language processing techniques are being applied to increasingly diverse types of electronic health records, and can benefit from in-depth understanding of the distinguishing characteristics of medical document types.
Exploration and analysis of potential data sources is a significant challenge in the application of NLP techniques to novel information domains.
Finally, we highlight several challenges in classifying performance assertions, including capturing information about sources of assistance, incorporating syntactic structure and negation scope, and handling new modalities at test time.
Analysis of word embedding properties to inform their use in downstream NLP tasks has largely been studied by assessing nearest neighbors.
Functioning is gaining recognition as an important indicator of global health, but remains under-studied in medical natural language processing research.
Analogy completion has been a popular task in recent years for evaluating the semantic properties of word embeddings, but the standard methodology makes a number of assumptions about analogies that do not always hold, either in recent benchmark datasets or when expanding into other domains.
We introduce second-order vector representations of words, induced from nearest neighborhood topological features in pre-trained contextual word embeddings.