SMM4H Shared Task 2020 - A Hybrid Pipeline for Identifying Prescription Drug Abuse from Twitter: Machine Learning, Deep Learning, and Post-Processing

This paper presents our approach to multi-class text categorization of tweets mentioning prescription medications as being indicative of potential abuse/misuse (A), consumption/non-abuse (C), mention-only (M), or an unrelated reference (U) using natural language processing techniques. Data augmentation increased our training and validation corpora from 13,172 tweets to 28,094 tweets. We also created word-embeddings on domain-specific social media and medical corpora. Our hybrid pipeline of an attention-based CNN with post-processing was the best performing system in task 4 of SMM4H 2020, with an F1 score of 0.51 for class A.

PDF Abstract


  Add Datasets introduced or used in this paper

Results from the Paper

  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.


No methods listed for this paper. Add relevant methods here