Input saliency methods have recently become a popular tool for explaining predictions of deep learning models in NLP.
This article studies register classification of documents from the unrestricted web, such as news articles or opinion blogs, in a multilingual setting, exploring both the benefit of training on multiple languages and the capabilities for zero-shot cross-lingual transfer.
In this work, we propose a method for explaining classes using deep learning models and the Integrated Gradients feature attribution technique by aggregating explanations of individual examples in text classification to general descriptions of the classes.
We explore cross-lingual transfer of register classification for web documents.
In this paper, we present the first publicly available part-of-speech and morphologically tagged corpus for the Albanian language, as well as a neural morphological tagger and lemmatizer trained on it.
The multilingual BERT model is trained on 104 languages and meant to serve as a universal language model and tool for encoding sentences.
News articles such as sports game reports are often thought to closely follow the underlying game statistics, but in practice they contain a notable amount of background knowledge, interpretation, insight into the game, and quotes that are not present in the official statistics.
We introduce an attention-based Bi-LSTM for Chinese implicit discourse relations and demonstrate that modeling argument pairs as a joint sequence can outperform word order-agnostic approaches.
While many models are purposed for detecting the occurrence of significant events in financial systems, the task of providing qualitative detail on the developments is not usually as well automated.
As we continue to collect and store textual data in a multitude of domains, we are regularly confronted with material whose largely unknown thematic structure we want to uncover.
Probabilistic topic modeling is a popular and powerful family of tools for uncovering thematic structure in large sets of unstructured text documents.
From the viewpoint of information visualization, this paper provides a general, yet simple, solution to projection-based coloring of the SOM that reveals structures.