Automatic question generation (QG) has shown promise as a source of synthetic training data for question answering (QA).
An intonational inventory of Urdu for spontaneous conversational speech is determined based on the analysis of a hand-labelled data set of telephone conversations.
In this paper we address the problem of providing personalised recommendations of recent scientific publications to a particular user, and explore the use of citation knowledge to do so.
In this paper we present, describe, and evaluate SentiEcon, a large, comprehensive, domain-specific computational lexicon designed for sentiment analysis applications, for which we compiled our own corpus of online business news.
While annotated learner corpora of English are widely available, large learner corpora of Spanish are less common.
This paper addresses the task of supervised hypernymy detection in Spanish through an order embedding and using pretrained word vectors as input.
As such, texts written in Arabizi are often disregarded in sentiment analysis tasks for Arabic.
To produce a more difficult dataset, we introduce a novel procedure for question acquisition in which workers author questions designed to target weaknesses of state-of-the-art neural question answering systems.
Mental health forums are online spaces where people can share their experiences anonymously and get peer support.
This paper presents a new comprehensive multi-level Part-Of-Speech tag set and a Support Vector Machine based Part-Of-Speech tagger for the Sinhala language.
A sentence aligned parallel corpus is an important prerequisite in statistical machine translation.
In this paper we investigate whether removing stopwords helps or hampers the effectiveness of Twitter sentiment classification methods.
Digitised Cultural Heritage (CH) items usually have short descriptions and lack rich contextual information.
In this paper we investigate the role of multilingual features in improving word sense disambiguation.