CAISA@SMM4H’22: Robust Cross-Lingual Detection of Disease Mentions on Social Media with Adversarial Methods
We propose adversarial methods for increasing the robustness of disease mention detection on social media. Our method applies adversarial data augmentation on the input and the embedding spaces to the English BioBERT model. We evaluate our method in the SocialDisNER challenge at SMM4H’22 on an annotated dataset of disease mentions in Spanish tweets. We find that both methods outperform a heuristic vocabulary-based baseline by a large margin. Additionally, utilizing the English BioBERT model shows a strong performance and outperforms the data augmentation methods even when applied to the Spanish dataset, which has a large amount of data, while augmentation methods show a significant advantage in a low-data setting.
PDF Abstract