2 code implementations • 14 Sep 2021 • Mahsa Yarmohammadi, Shijie Wu, Marc Marone, Haoran Xu, Seth Ebner, Guanghui Qin, Yunmo Chen, Jialiang Guo, Craig Harman, Kenton Murray, Aaron Steven White, Mark Dredze, Benjamin Van Durme
Zero-shot cross-lingual information extraction (IE) describes the construction of an IE model for some target language, given existing annotations exclusively in some other language, typically English.
Commonly-used transformer language models depend on a tokenization schema which sets an unchangeable subword vocabulary prior to pre-training, destined to be applied to all downstream tasks regardless of domain shift, novel word formations, or other sources of vocabulary mismatch.
Machine learning models that offer excellent predictive performance often lack the interpretability necessary to support integrated human machine decision-making.
Entity linking -- the task of identifying references in free text to relevant knowledge base representations -- often focuses on single languages.
Neural topic models can augment or replace bag-of-words inputs with the learned representations of deep pre-trained transformer-based word prediction models.
While previous research has raised concerns about possible biases in models produced from this data, no study has quantified how these biases actually manifest themselves with respect to different demographic groups, such as gender and racial/ethnic groups.
In this study, we treat the user interest as domains and empirically examine how the user language can vary across the user factor in three English social media datasets.
High-dimensional and unstructured data such as natural language complicates the evaluation of causal inference methods; such evaluations rely on synthetic datasets with known causal effects.
Data-driven methods for mental health treatment and surveillance have become a major focus in computational science research in the last decade.
Proxy-based methods for annotating mental health status in social media have grown popular in computational research due to their ability to gather large training samples.
We find that the multilingual ability of BERT leads to robust performance in monolingual and multilingual settings.
The #MeToo movement on Twitter has drawn attention to the pervasive nature of sexual harassment and violence.
We propose an approach to concept linking that leverages recent work in contextualized neural models, such as ELMo (Peters et al. 2018), which create a token representation that integrates the surrounding context of the mention and concept name.
Multilingual BERT (mBERT) trained on 104 languages has shown surprisingly good cross-lingual performance on several NLP tasks, even without explicit cross-lingual signals.
However, a straightforward implementation of this simple idea does not always work in practice: naive training of NER models using annotated data drawn from multiple languages consistently underperforms models trained on monolingual data alone, despite having access to more training data.
Clinical notes contain an extensive record of a patient's health status, such as smoking status or the presence of heart conditions.
Ranked #1 on Clinical Note Phenotyping on I2B2 2006: Smoking
Pretrained contextual representation models (Peters et al., 2018; Devlin et al., 2018) have pushed forward the state-of-the-art on many NLP tasks.
Ranked #7 on Cross-Lingual NER on CoNLL Spanish
While recurrent neural networks (RNNs) are widely used for text classification, they demonstrate poor performance and slow convergence when trained on long sequences.
Causal understanding is essential for many kinds of decision-making, but causal inference from observational data has typically only been applied to structured, low-dimensional datasets.
Previous work (McCorriston et al., 2015) presented a method for determining if an account was an individual or organization based on account profile and a collection of tweets.
no code implementations • • Benjamin Van Durme, Tom Lippincott, Kevin Duh, Deana Burchfield, Adam Poliak, Cash Costello, Tim Finin, Scott Miller, James Mayfield, Philipp Koehn, Craig Harman, Dawn Lawrie, Ch May, ler, Max Thomas, Annabelle Carrell, Julianne Chaloux, Tongfei Chen, Alex Comerford, Mark Dredze, Benjamin Glass, Shudong Hao, Patrick Martin, Pushpendre Rastogi, Rashmi Sankepally, Travis Wolfe, Ying-Ying Tran, Ted Zhang
It combines a multitude of analytics together with a flexible environment for customizing the workflow for different users.
EntitySpike uses a temporal heuristic to identify named entities with similar context that occur in the same time period (within minutes) during an event.
Practically, this means that we may treat the lexical resources as observations under the proposed generative model.
Existing Knowledge Base Population methods extract relations from a closed relational schema with limited coverage leading to sparse KBs.
Hand-engineered feature sets are a well understood method for creating robust NLP models, but they require a lot of expertise and effort to create.
This work presents a systematic theoretical and empirical comparison of the major algorithms that have been proposed for learning Harmonic and Optimality Theory grammars (HG and OT, respectively).
Many domain adaptation approaches rely on learning cross domain shared representations to transfer the knowledge learned in one domain to other domains.
Named entity recognition, and other information extraction tasks, frequently use linguistic features such as part of speech tags or chunkings.
We show how to adjust the model parameters to compensate for the errors introduced by this approximation, by following the gradient of the actual loss on training data.
Most work on building knowledge bases has focused on collecting entities and facts from as large a collection of documents as possible.
We propose a Feature-rich Compositional Embedding Model (FCM) for relation extraction that is expressive, generalizes to new domains, and is easy-to-implement.
Ranked #1 on Relation Extraction on ACE 2005 (Cross Sentence metric)
We present AROW, a new online learning algorithm that combines several properties of successful : large margin training, confidence weighting, and the capacity to handle non-separable data.
Confidence-weighted (CW) learning , an online learning method for linear classifiers, maintains a Gaussian distributions over weight vectors, with a covariance matrix that represents uncertainty about weights and correlations.
Automatic sentiment classification has been extensively studied and applied in recent years.