We show that BERT (Devlin et al., 2018) is a Markov random field language model.
For this we use the pre-trained LMs as fixed feature extractors and restrict the downstream task models to not have additional sequence modeling layers.
This paper presents a new system for open-ended discourse relation signal annotation in the framework of Rhetorical Structure Theory (RST), implemented on top of an online tool for RST annotation.
Social scientists have recently turned to analyzing text using tools from natural language processing like word embeddings to measure concepts like ideology, bias, and affinity.
This paper studies the use of NMT (neural machine translation) as a normalization method for an early English letter corpus.
Stack Long Short-Term Memory (StackLSTM) is useful for various applications such as parsing and string-to-tree neural machine translation, but it is also known to be notoriously difficult to parallelize for GPU training due to the fact that the computations are dependent on discrete operations.
This article focuses on the problem of identifying articles and recovering their text from within and across newspaper pages when OCR just delivers one text file per page.
Distributed word vector spaces are considered hard to interpret which hinders the understanding of natural language processing (NLP) models.
In this paper, we propose metrics to evaluate both the quality and diversity simultaneously by approximating the distance of the learned generative model and the real data distribution.