Self-learning paradigms in large-scale conversational AI agents tend to leverage user feedback in bridging between what they say and what they mean.
Additionally, the dependency on a fixed vocabulary limits the subword models' adaptability across languages and domains.
For the unsupervised setting, we provide the following language pairs: English and Spanish-English (Eng-Spanglish), and English and Modern Standard Arabic-Egyptian Arabic (Eng-MSAEA) in both directions.
Current work in named entity recognition (NER) shows that data augmentation techniques can produce more robust models.
To alleviate these challenges, we propose a character-based subword module (char2subword) that learns the subword embedding table in pre-trained models like BERT.
Multimodal named entity recognition (MNER) requires to bridge the gap between language understanding and visual context.
In this paper, we present the results of the SemEval-2020 Task 9 on Sentiment Analysis of Code-Mixed Tweets (SentiMix 2020).
To facilitate research in this direction, we propose a centralized benchmark for Linguistic Code-switching Evaluation (LinCE) that combines ten corpora covering four different code-switched language pairs (i. e., Spanish-English, Nepali-English, Hindi-English, and Modern Standard Arabic-Egyptian Arabic) and four tasks (i. e., language identification, named entity recognition, part-of-speech tagging, and sentiment analysis).
In this paper, we propose to distill the internal representations of a large model such as BERT into a simplified version of it.
We show the effectiveness of this transfer learning step by outperforming multilingual BERT and homologous CS-unaware ELMo models and establishing a new state of the art in CS tasks, such as NER and POS tagging.
On the other hand, the global attention spots the most relevant words in the sequence.
This paper considers the problem of characterizing stories by inferring properties such as theme and style using written synopses and reviews of movies.
Studies on emotion recognition (ER) show that combining lexical and acoustic information results in more robust and accurate models.
Named Entity Recognition for social media data is challenging because of its inherent noisiness.
Ranked #21 on Named Entity Recognition (NER) on WNUT 2017
Our systems outperform the current F1 scores of the state of the art on the Workshop on Noisy User-generated Text 2017 dataset by 2. 45% and 3. 69%, establishing a more suitable approach for social media environments.
Ranked #17 on Named Entity Recognition (NER) on WNUT 2017
In the third shared task of the Computational Approaches to Linguistic Code-Switching (CALCS) workshop, we focus on Named Entity Recognition (NER) on code-switched social-media data.