108 papers with code • 4 benchmarks • 15 datasets
Language identification is the task of determining the language of a text.
We present a treebank of Hindi-English code-switching tweets under Universal Dependencies scheme and propose a neural stacking model for parsing that efficiently leverages part-of-speech tag and syntactic tree annotations in the code-switching treebank and the preexisting Hindi and English treebanks.
In particular, we model the task hierarchically, identifying the type and the target of offensive messages in social media.
To our knowledge this is the largest audio corpus in the public domain for speech recognition, both in terms of number of hours and number of languages.
On the CoVoST-2 speech translation benchmark, we improve the previous state of the art by an average of 7. 4 BLEU over 21 translation directions into English.