1 code implementation • EMNLP (WNUT) 2020 • Justin Sech, Alexandra DeLucia, Anna L. Buczak, Mark Dredze
We present baseline systems trained on this data for the identification of tweets related to civil unrest.
1 code implementation • COLING (WNUT) 2022 • Jingyu Zhang, Alexandra DeLucia, Mark Dredze
Despite the importance of these tools for data curation, the impact of tweet language, country of origin, and creation date on tool performance remains largely unknown.
no code implementations • NAACL (CLPsych) 2022 • Ayah Zirikly, Mark Dredze
In the case of mental health diagnosis, clinicians already rely on an assessment framework to make these decisions; that framework can help a model generate meaningful explanations. In this work we propose to use PHQ-9 categories as an auxiliary task to explaining a social media based model of depression.
no code implementations • ACL 2022 • Sheena Panthaplackel, Adrian Benton, Mark Dredze
We propose the task of updated headline generation, in which a system generates a headline for an updated article, considering both the previous article and headline.
no code implementations • BioNLP (ACL) 2022 • Zach Wood-Doughty, Isabel Cachola, Mark Dredze
We propose to use knowledge distillation, or training a student model that mimics the behavior of a trained teacher model, as a technique to generate faithful and plausible explanations.
1 code implementation • WNUT (ACL) 2021 • Abhinav Chinta, Jingyu Zhang, Alexandra DeLucia, Mark Dredze, Anna L. Buczak
Twitter is commonly used for civil unrest detection and forecasting tasks, but there is a lack of work in evaluating how civil unrest manifests on Twitter across countries and events.
no code implementations • NAACL (CLPsych) 2021 • Eli Sherman, Keith Harrigian, Carlos Aguirre, Mark Dredze
Spurred by advances in machine learning and natural language processing, developing social media-based mental health surveillance models has received substantial recent attention.
no code implementations • NAACL (CLPsych) 2021 • Carlos Aguirre, Mark Dredze
Models for identifying depression using social media text exhibit biases towards different gender and racial/ethnic groups.
no code implementations • 30 Mar 2023 • Shijie Wu, Ozan Irsoy, Steven Lu, Vadim Dabravolski, Mark Dredze, Sebastian Gehrmann, Prabhanjan Kambadur, David Rosenberg, Gideon Mann
The use of NLP in the realm of financial technology is broad and complex, with applications ranging from sentiment analysis and named entity recognition to question answering.
1 code implementation • 13 Dec 2022 • David Mueller, Nicholas Andrews, Mark Dredze
Learning these models often requires specialized training algorithms that address task-conflict in the shared parameter updates, which otherwise can lead to negative transfer.
no code implementations • 15 Nov 2022 • Carlos Aguirre, Mark Dredze, Philip Resnik
Stressors are related to depression, but this relationship is complex.
1 code implementation • RepL4NLP (ACL) 2022 • Shijie Wu, Benjamin Van Durme, Mark Dredze
Pretrained multilingual encoders enable zero-shot cross-lingual transfer, but often produce unreliable models that exhibit high performance variance on the target language.
no code implementations • NAACL (CLPsych) 2022 • Keith Harrigian, Mark Dredze
Self-disclosed mental health diagnoses, which serve as ground truth annotations of mental health status in the absence of clinical measures, underpin the conclusions behind most computational studies of mental health language from the last decade.
1 code implementation • 22 Jun 2022 • Keith Harrigian, Mark Dredze
Social media allows researchers to track societal and cultural changes over time based on language analysis tools.
no code implementations • 23 May 2022 • Moniba Keymanesh, Adrian Benton, Mark Dredze
Previous work shows that pre-trained language models(PLMs) perform remarkably well on this task after fine-tuning on a significant amount of task-specific training data.
1 code implementation • 20 Mar 2022 • Xiaolei Huang, Franck Dernoncourt, Mark Dredze
Clinical notes in Electronic Health Records (EHR) present rich documented information of patients to inference phenotype for disease diagnosis and study patient characteristics for cohort selection.
2 code implementations • EMNLP 2021 • Mahsa Yarmohammadi, Shijie Wu, Marc Marone, Haoran Xu, Seth Ebner, Guanghui Qin, Yunmo Chen, Jialiang Guo, Craig Harman, Kenton Murray, Aaron Steven White, Mark Dredze, Benjamin Van Durme
Zero-shot cross-lingual information extraction (IE) describes the construction of an IE model for some target language, given existing annotations exclusively in some other language, typically English.
no code implementations • 1 Aug 2021 • Yuval Pinter, Amanda Stent, Mark Dredze, Jacob Eisenstein
Commonly-used transformer language models depend on a tokenization schema which sets an unchangeable subword vocabulary prior to pre-training, destined to be applied to all downstream tasks regardless of domain shift, novel word formations, or other sources of vocabulary mismatch.
1 code implementation • 16 Apr 2021 • Zach Wood-Doughty, Isabel Cachola, Mark Dredze
Machine learning models that offer excellent predictive performance often lack the interpretability necessary to support integrated human machine decision-making.
no code implementations • 16 Apr 2021 • Elliot Schumacher, James Mayfield, Mark Dredze
Entity linking -- the task of identifying references in free text to relevant knowledge base representations -- often focuses on single languages.
1 code implementation • NAACL 2021 • Aaron Mueller, Mark Dredze
Neural topic models can augment or replace bag-of-words inputs with the learned representations of deep pre-trained transformer-based word prediction models.
no code implementations • EACL 2021 • Carlos Aguirre, Keith Harrigian, Mark Dredze
While previous research has raised concerns about possible biases in models produced from this data, no study has quantified how these biases actually manifest themselves with respect to different demographic groups, such as gender and racial/ethnic groups.
1 code implementation • EACL (AdaptNLP) 2021 • Xiaolei Huang, Michael J. Paul, Robin Burke, Franck Dernoncourt, Mark Dredze
In this study, we treat the user interest as domains and empirically examine how the user language can vary across the user factor in three English social media datasets.
no code implementations • 10 Feb 2021 • Zach Wood-Doughty, Ilya Shpitser, Mark Dredze
High-dimensional and unstructured data such as natural language complicates the evaluation of causal inference methods; such evaluations rely on synthetic datasets with known causal effects.
1 code implementation • NAACL (CLPsych) 2021 • Keith Harrigian, Carlos Aguirre, Mark Dredze
Data-driven methods for mental health treatment and surveillance have become a major focus in computational science research in the last decade.
1 code implementation • Findings of the Association for Computational Linguistics 2020 • Keith Harrigian, Carlos Aguirre, Mark Dredze
Proxy-based methods for annotating mental health status in social media have grown popular in computational research due to their ability to gather large training samples.
1 code implementation • Findings (ACL) 2021 • Elliot Schumacher, James Mayfield, Mark Dredze
We find that the multilingual ability of BERT leads to robust performance in monolingual and multilingual settings.
no code implementations • 13 Oct 2020 • Aaron Mueller, Zach Wood-Doughty, Silvio Amir, Mark Dredze, Alicia L. Nobles
The #MeToo movement on Twitter has drawn attention to the pervasive nature of sexual harassment and violence.
1 code implementation • EMNLP 2020 • Shijie Wu, Mark Dredze
Multilingual BERT (mBERT), XLM-RoBERTa (XLMR) and other unsupervised multilingual encoders can effectively learn cross-lingual representation.
no code implementations • ACL 2020 • Elliot Schumacher, Andriy Mulyar, Mark Dredze
We propose an approach to concept linking that leverages recent work in contextualized neural models, such as ELMo (Peters et al. 2018), which create a token representation that integrates the surrounding context of the mention and concept name.
1 code implementation • WS 2020 • Shijie Wu, Mark Dredze
Multilingual BERT (mBERT) trained on 104 languages has shown surprisingly good cross-lingual performance on several NLP tasks, even without explicit cross-lingual signals.
1 code implementation • ACL 2020 • David Mueller, Nicholas Andrews, Mark Dredze
However, a straightforward implementation of this simple idea does not always work in practice: naive training of NER models using annotated data drawn from multiple languages consistently underperforms models trained on monolingual data alone, despite having access to more training data.
Multilingual Named Entity Recognition
named-entity-recognition
+2
1 code implementation • NAACL (SocialNLP) 2021 • Zach Wood-Doughty, Paiheng Xu, Xiao Liu, Mark Dredze
We present a method to identify self-reports of race and ethnicity from Twitter profile descriptions.
2 code implementations • 30 Oct 2019 • Andriy Mulyar, Elliot Schumacher, Masoud Rouhizadeh, Mark Dredze
Clinical notes contain an extensive record of a patient's health status, such as smoking status or the presence of heart conditions.
Ranked #1 on
Clinical Note Phenotyping
on I2B2 2006: Smoking
no code implementations • WS 2019 • Silvio Amir, Mark Dredze, John W. Ayers
The ability to track mental health conditions via social media opened the doors for large-scale, automated, mental health surveillance.
2 code implementations • IJCNLP 2019 • Shijie Wu, Mark Dredze
Pretrained contextual representation models (Peters et al., 2018; Devlin et al., 2018) have pushed forward the state-of-the-art on many NLP tasks.
Ranked #7 on
Cross-Lingual NER
on CoNLL Spanish
no code implementations • AKBC 2019 • Elliot Schumacher, Mark Dredze
Linking mentions of medical concepts in a clinical note to a concept in an ontology enables a variety of tasks that rely on understanding the content of a medical record, such as identifying patient populations and decision support.
no code implementations • WS 2018 • Zach Wood-Doughty, Nicholas Andrews, Mark Dredze
While recurrent neural networks (RNNs) are widely used for text classification, they demonstrate poor performance and slow convergence when trained on long sequences.
no code implementations • WS 2018 • Adrian Benton, Mark Dredze
Many social media classification tasks analyze the content of a message, but do not consider the context of the message.
1 code implementation • EMNLP 2018 • Zach Wood-Doughty, Ilya Shpitser, Mark Dredze
Causal understanding is essential for many kinds of decision-making, but causal inference from observational data has typically only been applied to structured, low-dimensional datasets.
1 code implementation • WS 2018 • Zach Wood-Doughty, Praateek Mahajan, Mark Dredze
Previous work (McCorriston et al., 2015) presented a method for determining if an account was an individual or organization based on account profile and a collection of tweets.
1 code implementation • NAACL 2018 • Adrian Benton, Mark Dredze
We present deep Dirichlet Multinomial Regression (dDMR), a generative topic model that simultaneously learns document feature representations and topics.
1 code implementation • WS 2018 • Zach Wood-Doughty, Nicholas Andrews, Rebecca Marvin, Mark Dredze
Social media analysis frequently requires tools that can automatically infer demographics to contextualize trends.
no code implementations • IJCNLP 2017 • Benjamin Van Durme, Tom Lippincott, Kevin Duh, Deana Burchfield, Adam Poliak, Cash Costello, Tim Finin, Scott Miller, James Mayfield, Philipp Koehn, Craig Harman, Dawn Lawrie, Ch May, ler, Max Thomas, Annabelle Carrell, Julianne Chaloux, Tongfei Chen, Alex Comerford, Mark Dredze, Benjamin Glass, Shudong Hao, Patrick Martin, Pushpendre Rastogi, Rashmi Sankepally, Travis Wolfe, Ying-Ying Tran, Ted Zhang
It combines a multitude of analytics together with a flexible environment for customizing the workflow for different users.
no code implementations • WS 2017 • Anietie Andy, Mark Dredze, Mugizi Rwebangira, Chris Callison-Burch
EntitySpike uses a temporal heuristic to identify named entities with similar context that occur in the same time period (within minutes) during an event.
no code implementations • WS 2017 • Zach Wood-Doughty, Michael Smith, David Broniatowski, Mark Dredze
Demographically-tagged social media messages are a common source of data for computational social science.
no code implementations • ACL 2017 • Nicholas Andrews, Mark Dredze, Benjamin Van Durme, Jason Eisner
Practically, this means that we may treat the lexical resources as observations under the proposed generative model.
Low Resource Named Entity Recognition
named-entity-recognition
+2
no code implementations • ACL 2017 • Travis Wolfe, Mark Dredze, Benjamin Van Durme
Existing Knowledge Base Population methods extract relations from a closed relational schema with limited coverage leading to sparse KBs.
no code implementations • WS 2017 • Adrian Benton, Glen Coppersmith, Mark Dredze
Social media have transformed data-driven research in political science, the social sciences, health, and medicine.
no code implementations • 22 Feb 2017 • Travis Wolfe, Mark Dredze, Benjamin Van Durme
Hand-engineered feature sets are a well understood method for creating robust NLP models, but they require a lot of expertise and effort to create.
no code implementations • 19 Feb 2017 • Ann Irvine, Mark Dredze
This work presents a systematic theoretical and empirical comparison of the major algorithms that have been proposed for learning Harmonic and Optimality Theory grammars (HG and OT, respectively).
no code implementations • WS 2016 • Anietie Andy, Satoshi Sekine, Mugizi Rwebangira, Mark Dredze
In this paper, we propose an algorithm to reduce the number of unanswered questions in Yahoo!
no code implementations • WS 2017 • Nanyun Peng, Mark Dredze
Many domain adaptation approaches rely on learning cross domain shared representations to transfer the knowledge learned in one domain to other domains.
no code implementations • 20 Jun 2016 • Mark Dredze, Manuel García-Herranz, Alex Rutherford, Gideon Mann
Data on human spatial distribution and movement is essential for understanding and analyzing social systems.
1 code implementation • NAACL 2016 • Mo Yu, Mark Dredze, Raman Arora, Matthew Gormley
Modern NLP models rely heavily on engineered features, which often combine word and contextual information into complex lexical features.
no code implementations • ACL 2016 • Nanyun Peng, Mark Dredze
Named entity recognition, and other information extraction tasks, frequently use linguistic features such as part of speech tags or chunkings.
no code implementations • TACL 2015 • Matthew R. Gormley, Mark Dredze, Jason Eisner
We show how to adjust the model parameters to compensate for the errors introduced by this approximation, by following the gradient of the actual loss on training data.
no code implementations • 31 May 2015 • Travis Wolfe, Mark Dredze, James Mayfield, Paul McNamee, Craig Harman, Tim Finin, Benjamin Van Durme
Most work on building knowledge bases has focused on collecting entities and facts from as large a collection of documents as possible.
1 code implementation • EMNLP 2015 • Matthew R. Gormley, Mo Yu, Mark Dredze
We propose a Feature-rich Compositional Embedding Model (FCM) for relation extraction that is expressive, generalizes to new domains, and is easy-to-implement.
Ranked #1 on
Relation Extraction
on ACE 2005
(Cross Sentence metric)
1 code implementation • TACL 2015 • Mo Yu, Mark Dredze
We propose efficient unsupervised and task-specific learning objectives that scale our model to large datasets.
no code implementations • TACL 2015 • Michael J. Paul, Mark Dredze
We introduce Sprite, a family of topic models that incorporates structure into model priors as a function of underlying components.
no code implementations • NeurIPS 2012 • Michael Paul, Mark Dredze
Multi-dimensional latent variable models can capture the many latent factors in a text corpus, such as topic, author perspective and sentiment.
no code implementations • NeurIPS 2009 • Koby Crammer, Alex Kulesza, Mark Dredze
We present AROW, a new online learning algorithm that combines several properties of successful : large margin training, confidence weighting, and the capacity to handle non-separable data.
no code implementations • NeurIPS 2008 • Koby Crammer, Mark Dredze, Fernando Pereira
Confidence-weighted (CW) learning [6], an online learning method for linear classifiers, maintains a Gaussian distributions over weight vectors, with a covariance matrix that represents uncertainty about weights and correlations.
no code implementations • 1 Jun 2007 • John Blitzer, Mark Dredze, Fernando Pereira
Automatic sentiment classification has been extensively studied and applied in recent years.