Code-mixed text generation systems have found applications in many downstream tasks, including speech recognition, translation and dialogue.
In this work, we analyze the robustness of neural machine translation systems towards grammatical perturbations in the source.
In this paper, we present CASENT, a seq2seq model designed for ultra-fine entity typing that predicts ultra-fine types with calibrated confidence scores.
On the two proposed tasks, we compare multiple event linking systems including BM25+ (Lv and Zhai, 2011) and multilingual adaptations of the biencoder and crossencoder architectures from BLINK (Wu et al., 2020).
To this end, we design a new annotation workflow with careful quality control and an easy-to-use annotation interface.
Text generation systems are ubiquitous in natural language processing applications.
Using cross-lingual transfer, even with no expert annotations in the language of interest, our framework extracts a grammatical specification which is nearly equivalent to those created with large amounts of gold-standard annotated data.
We compare three existing bilingual word embedding approaches, and a novel approach of training skip-grams on synthetic code-mixed text generated through linguistic models of code-mixing, on two tasks - sentiment analysis and POS tagging for code-mixed text.
Training language models for Code-mixed (CM) language is known to be a difficult problem because of lack of data compounded by the increased confusability due to the presence of more than one language.